20110709

Spot the Bug: Wacky Data from the Network

At work there's a discussion group called "Spot the Bug". The idea is that developers from all around the company can submit interesting bugs they've fixed, and others can try to puzzle out what the nature of the bug is. I came across a bug the other day in an aspect of programming I often neglect. I thought I'd share it here. I'll briefly describe the background of the code, then show you the code itself. In a series of hidden sections afterwards I give hints and eventually the solution, but please, I encourage you not to go look at the answer until trying it yourself. I also encourage you to copy down the code, compile it, and add to it in an effort to solve the problem.

One note: there is one main bug, and it's not a trick question like a semicolon missing or a badly named variable. The bug is not super obvious, which is why it's worth mentioning.

We have some code that receives a structure from the network. There are a few specific variations of the general structure that differ only by the size of variable length data tacked on to the end. In the GeneralThing structure, you'll see the idiom: BYTE variable[1]. It indicates that the field is really a variable length piece of data contiguous on the end.

Sure enough, if you look at SpecificThing, its variable field is 512 bytes long. You can see how you could cast a SpecificThing to a GeneralThing and safely read its variable length data by using the cbVariable ("cb" = "count of bytes") field to make sure you don't overrun.

In the code, we do some extra checking to validate that the sizes we received are safe before we process the data. Somewhere in this program there's a bug. Can you spot it?

#define UNICODE 1
#include <windows.h>
#include <stdio.h>

struct GeneralThing
{
    DWORD dw1;
    DWORD dw2;
    DWORD cbVariable;
    BYTE variable[1];
};

struct SpecificThing
{
    DWORD dw1;
    DWORD dw2;
    DWORD cbVariable;
    BYTE variable[512];
};

/*
for example, here is another such specific thing we might get from the network

struct SpecificThing
{
    DWORD dw1;
    DWORD dw2;
    DWORD cbVariable;
    BYTE variable[128]; <----- different sized variable data
};
*/

/*
** the caller supplies a GeneralThing, but since it is actually some
** SpecificThing, it also passes the overall size of the structure including
** the variable length data
*/
bool ProcessData(const GeneralThing* pThing, DWORD cbThing)
{
    if (pThing->cbVariable <= cbThing - (sizeof(GeneralThing) - sizeof(pThing->variable)))
    {
        wprintf(L"processed data - last byte is %d\n",
                pThing->variable[pThing->cbVariable - 1]);

        return true;
    }

    return false;
}

/*
** pbBuffer - destination buffer to be filled from network
** pcbBuffer - initially, max size of the buffer.
**             when the function returns, it is the number of bytes stored in
**             pbBuffer
*/
void ReadDataFromNetwork(BYTE* pbBuffer, DWORD* pcbBuffer)
{
    memset(pbBuffer, 3, *pcbBuffer);

    /*
    ** normally *pcbBuffer could change, but for this example, assume we filled
    ** the buffer completely (hence it is unchanged from the original max size)
    */
}

int __cdecl wmain(int argc, wchar_t* argv[])
{
    SpecificThing thing = {0};
    thing.dw1 = 1;
    thing.dw2 = 2;

    thing.cbVariable = sizeof(thing.variable);
    ReadDataFromNetwork(thing.variable, &thing.cbVariable);

    ProcessData(reinterpret_cast<GeneralThing*>(&thing), sizeof(SpecificThing));

    return 0;
}

Hint 1: click to show. But give it an earnest try first!

The first hint is to examine the size check closely. Understand what it's verifying, then see if there are any problems with it. Compile and run the program, and see what values are being used for the calculation.

Hint 2: click to show.

Think about padding and alignment. Look them up if you need, and see what their implications are.

Hint 3: click to show.

In case you couldn't find one, here's a good link about padding and alignment.

Solution: click to show.

GeneralThing strictly speaking has 13 bytes of data in it, but due to alignment, is padded to the next 4-byte boundary: 16 bytes. In fact, the compiler even reports this size in the sizeof keyword. The size check fails because it subtracts the non-variable-data part off of the total size given, but subtracts too much, since it includes the padding.

Here is a version of the program that shows both the bug more clearly and the bug fix.

#define UNICODE 1
#include <windows.h>
#include <stdio.h>

struct GeneralThing
{
    DWORD dw1;
    DWORD dw2;
    DWORD cbVariable;
    BYTE variable[1];
};

struct SpecificThing
{
    DWORD dw1;
    DWORD dw2;
    DWORD cbVariable;
    BYTE variable[512];
};

/*
** the caller supplies a GeneralThing, but since it is actually some
** SpecificThing, it also passes the overall size of the structure including
** the variable length data
*/
bool ProcessData(const GeneralThing* pThing, DWORD cbThing)
{
    bool fSizeCheck = (pThing->cbVariable <= cbThing - (sizeof(GeneralThing) - sizeof(pThing->variable)));

    wprintf(L"checking (pThing->cbVariable <= cbThing - (sizeof(GeneralThing) - sizeof(pThing->variable))):\n");
    wprintf(L"(%d >= %d - (%d - %d)) -> %s\n", pThing->cbVariable, cbThing, sizeof(GeneralThing), sizeof(pThing->variable), (fSizeCheck ? L"PASS" : L"FAIL"));

    return fSizeCheck;
}

bool ProcessData_FO(const GeneralThing* pThing, DWORD cbThing)
{
    bool fSizeCheck = (pThing->cbVariable <= cbThing - FIELD_OFFSET(GeneralThing, variable));

    wprintf(L"checking (pThing->cbVariable <= cbThing - FIELD_OFFSET(GeneralThing, variable)):\n");
    wprintf(L"(%d >= %d - %d) -> %s\n", pThing->cbVariable, cbThing, FIELD_OFFSET(GeneralThing, variable), (fSizeCheck ? L"PASS" : L"FAIL"));

    return fSizeCheck;
}

/*
** pbBuffer - destination buffer to be filled from network
** pcbBuffer - initially, max size of the buffer.
**             when the function returns, it is the number of bytes stored in
**             pbBuffer
*/
void ReadDataFromNetwork(BYTE* pbBuffer, DWORD* pcbBuffer)
{
    memset(pbBuffer, 3, *pcbBuffer);

    /*
    ** normally *pcbBuffer could change, but for this example, assume we filled
    ** the buffer completely (hence it is unchanged from the original max size)
    */
}

int __cdecl wmain(int argc, wchar_t* argv[])
{
    SpecificThing thing = {0};
    thing.dw1 = 1;
    thing.dw2 = 2;

    thing.cbVariable = sizeof(thing.variable);
    ReadDataFromNetwork(thing.variable, &thing.cbVariable);

    ProcessData(reinterpret_cast<GeneralThing*>(&thing), sizeof(SpecificThing));
    wprintf(L"\n");
    ProcessData_FO(reinterpret_cast<GeneralThing*>(&thing), sizeof(SpecificThing));

    return 0;
}

The use FIELD_OFFSET macro (I believe) gives the offset of the named field within a structure. In this case, the offset of the variable field in the structure is 12 -- exactly the amount we want to subtract.