20091014

Test First, Try Later

As I was implementing aliases, I started getting better about testing things. In a previous post, I mentioned how I found testing my code to be kind of tedious work that I was loath to do. Well, I think I've had a change of heart. Part of it is probably that writing tests for old code that's already been working at some level for ages is drudgery. At work, too, nobody particularly likes that. On the other hand, testing shiny new features can be pretty fun.

This time, for pretty much every checkin that I did, some test update went along with it, usually in the form of adding a test to verify the specific behavior that I changed with that checkin.

One particularly bad bug I fixed was a strange case of reentrancy. Every room has a set of exits, each of which points to another room. It's conventional for rooms to have reciprocal exits, that is, a bidirectional passage between two rooms. In some cases, rooms with these reciprocal exits wouldn't fully load each other, so they'd end up with one way passages. The nature of the fix is a little complicated to explain, but when I thought I had it working, I made tests with reciprocal exits and rooms in a unidirectional loop (room 1 -> room 2 -> room 3 -> room 1) to test those kinds of cycles.

Actually, the main difference in my methodology for this feature was writing automated tests before ad-hoc testing a change. In the past, I'd notice a bug by running the MUD itself, log on, do something and notice something wrong due to some symptom. Then I'd fix the bug and test that I'd fixed it by loading the MUD again and trying the same scenario, noticing whether the symptom was gone. This time instead, I'd write a new test that should catch the bug, then check that it fails beforehand and passes after. Afterwards, of course, I test it out in the real MUD to verify my assumption that the new test reflects the same scenario as the problematic symptom.

It's probably worth saying that I don't do this at work. If I find a bug in one of my areas, sometimes it might get translated into a regression test, but not always. I'm not even sure if that happens half of the time. There are a few factors that contribute to that, chief among them that my product is not nearly as testable. I'll show a few examples of how I tested some of my MUD bugs here, and it is self evident that it's easier to write tests like these, compared to writing automated tests that have to click buttons and load web pages in order to make a bug appear.

Alias processing

One set of tests I wrote were for processing the alias strings. That is, given the input command and arguments and the output alias template, I wrote tests to make sure that all the input arguments got filled into the right places. These are pretty close to unit tests in the traditional sense.

def process_test(strCmd, strArgs, strExpected)
    strActual = AliasCommand.processCommandString(strCmd, strArgs)
    assert_equal(strExpected, strActual, "output of processCommandString")
end # function process_test

def test_1arg_trailer()
    process_test("abc &1 d", "a b", "abc a d b")
end

With most of my tests I use this kind of pattern of having a helper function take care of the common work. In every one of my alias processing tests, the test merely consisted of some setup (strCmd), some input (strArgs), and an expected result.

AliasModCommand

AliasModCommand is the command used by a player to view and modify his aliases in-game. I decided to test these with a series of mock objects that simulate quite a bit of a character actually playing the MUD.

def createManager()
    char = XmlCharacterIO.loadCharacter($td.cd1.name)
    manager = ConnectionManager.new(MockConnection.new())
    manager.enterCharacter(char)
    manager.mainInteractionMode.aliases = []
    return manager
end # function createManager

I load a character from a well-known set of test data, since there must always be a character when dealing with these types of simulations. I create a ConnectionManager, because that gives controls to inject commands into the MUD. The MockConnection is a stubbed out connection object. Normally this has the functionality of performing the network I/O, but in my case all the methods do nothing, so it's basically just a dummy.

def showAliases_test(aliases, expectedText)
    manager = createManager()

    aliases.each() { |pr|
        manager.process("alias #{pr[0]} #{pr[1]}")
        manager.pulse()
    }

    filt = LastLineOutputFilter.new()

    manager.addOutputFilter(filt)

    manager.process("alias")
    manager.pulse()

    assert_equal("Aliases:" + expectedText + "\r\n", filt.prev(), "output of alias command")
end # function showAliases_test

There's a lot going on here in this helper function, but I think it's kind of interesting. After creating the manager, we actually instruct it to process lines of text that add the aliases the test requested. The pulse call causes the manager basically to process the next command.

Now it gets trickier. In an old post, I talked about filters and how they're used to get all of the text that some player sees. Well, I'm using one here, too. The LastLineOutputFilter sees all of the text that a given manager outputs, and stores the last output call - basically a short history of the last things the player would have seen on his screen. I enter a command (in this case "alias") and catch the output to see what it was. Note that here it's the job of the test to pass in the correct expected output of the alias command to verify against. Still, with the amount of work taken care of by the helper function, the tests themselves are pretty concise.

def test_showAliases_simple()
    aliases = [["a", "b"]]
    showAliases_test(aliases, "\r\n  a => b")
end

TC_XmlArray

Implementing XmlArray was a big part of getting aliases done, so of course I wrote some tests to make sure they were working themselves. I could have written these tests using actual XML files, but I thought it would be easier just to embed the XML right into the tests.

def test_load_XmlReadable()
    strXml =
<<HERE
<root>
<arr>
    <e>1</e>
    <e>2</e>
    <e>3</e>
    <e>4</e>
    <e>5</e>
</arr>
</root>
HERE

    doc = REXML::Document.new(strXml)

    classTextArray = XmlArray.newClassWithType('e', Multiplier_XmlReadable)

    things = XmlReader.loadThings([doc.elements['root']], "arr", classTextArray, true)

    assert_equal(1, things.length(), "number of arrays loaded")

    arr = things[0]
    assert_equal(5, arr.length(), "number of elements in array")

    arr.each_index() { |i|
        assert_equal(((i+1) * 3), arr[i].num(), "index #{i} in the array")
    }
end

This illustrates something more useful than just embedding XML in the test. Any old schmuck can do that. Notice that the array is of type Multiplier_XmlReadable. This is a mock class I created that implements XmlReadable about as simply as possible. Its XML representation is simply a number. When it is loaded into an object, it simply multiplies the number that was read by some constant. To the loading code, this looks similar to a bunch of other objects throughout the MUD that are persisted.

Complicated loading tests - delayed load

I used these test objects pretty heavily for some other tests. For example, there is code in loading that makes repeated passes over all of the things being loaded, trying to load them and all their properties over and over until all dependencies have been resolved. If all the things being loaded are simple (no outward dependencies) then none of that iterative code ever gets exercised. In order to write tests targeting those codepaths, I need control over the objects being loaded.

Similar to Multiplier_XmlReadable, I created Multiplier_PartiallyLoadable that gives me some hooks to help here. For the example above, I would like to make certain elements not fully load on the first try. I could do this by really recreating this network of dependencies, but at some point that becomes complicated itself, and I want to do as little debugging of tests as possible.

# in the test
    strXml =
<<HERE
<root>
    <c sets='1'><e>1</e></c>
    <c sets='2'><e>2</e></c>
</root>
HERE

# in Multiplier_PartiallyLoadable
def num=(other)
    if (other.kind_of?(Integer))
        if @setCounter < setThreshold()
            @num = PartiallyLoadable::NotLoaded
            @setCounter = @setCounter + 1
        else
            @num = other * Coefficient
        end
    elsif other == PartiallyLoadable::NotLoaded
        @num = other
    else
        raise "setting num to #{other} of wrong type"
    end
end # function num=

So instead I simulate it with a trick. Multiplier_PartiallyLoadable has a single property that points to num, so the loading system knows to call num= when setting the property to some value. What I do here is fail to actually set the property for the first n times, where n is given by the "sets" attribute in XML.

So in the example above, in the first iteration in loading, the first element would get loaded completely, but the second would need one more pass.

Complicated loading tests - linked load

There are two times at which a thing can be pronounced fully loaded. In the top-level loop, if a thing has all of its properties loaded, then its status is changed to loaded. Or, if loading is in the middle of loading the properties of something, each of those properties that becomes loaded has its status likewise changed -- not in the top-level loop but down in the recursive property loading code.

In and of themselves, neither of these are too difficult, and are already covered above. What is a little harder is the case where one of the properties of some thing (which would be loaded in the property loading code, not the top-level iterations) is itself also in the top-level list of things. This requires some property relationship. For example, when loading a list of rooms, this would arise: all the rooms are in the top-level list and are also in properties of other rooms as exits. I alluded to a bug in this scenario earlier in this post.

# in the test
    strXml =
<<HERE
<root>
    <c sets='1' linked_load='2'><e>1</e></c>
    <c sets='1'><e>2</e></c>
</root>
HERE

# in Multiplier_PartiallyLoadable
def loadingComplete()
    if linkedLoad() != nil
        # ll is set to the actual object associated with the linked_load
        # attribute
        if ll
            ll.loadingComplete()
            ll.isLoadingComplete = true
        end
    end
end # function loadingComplete

This snippet of the code is hopefully fairly concise. When one of these objects becomes loaded (loadingComplete is called exactly once per object), if its XML has indicated the number of another object "linked" to this one, also load that guy.

Error condition tests

I throw (sorry, "raise") exceptions from a few places, so I need tests for those places. Although Test::Unit provides some assertions for raising exceptions, their focus is more on the exception being raised at the expense of the result in the case where an exception (incorrectly) isn't raised. Whenever an exception test fails, the actual outcome involves some data, and I want that as part of the failure message. So I wrote a helper.

def loadtest_bad(el)
    gotException = false

    text = nil
    begin
        al = XmlReader.constructFromElement(el, AliasCommand, true)
    rescue
        gotException = true
    end

    assert(gotException, "should have gotten exception for bad input, but got #{al}")
end # function loadtest_bad

I assert that the test got an exception, and if it doesn't, the data it did get is printed out.

Unit Testing doesn't scale

All this testing is fine, but there's a problem with it, and it's the reason why we don't write that many tests like this at work. Testing individual behaviors like this isn't the most efficient way in many cases, and time is always scarce.

If I were testing something like this at work, I would probably model the area. I'd break each aspect up into its separate pieces and create a matrix of possibilities for each scenario. For instance, when testing the command for manipulating aliases in-game, there are several parameters. Each of these parameters is either present, empty, or has some specific characteristics. Understanding the characteristics that are important is the essence of modeling.

The level of test coverage that I have so far is still in its infancy, so I'm not yet at the point where the remaining bugs to find are going to be found with larger models. I'm still covering the basics.

1 comments:

leia said...

a strange case of reentrancy O_O

Post a Comment