20090530

Everything at its Time

I had an interesting bug with eventos that I think is worth sharing. When I first noticed it, without thinking about it too much, I thought the cause was one thing--which had a simple solution. Turns out I was wrong about that.

Recall from a previous post about eventos that I implemented a toy response for the say evento for nipics. That is, when they hear someone say something, they respond in indignation. In the linked post, it looks quite sane. Compare that to what I discovered it somehow regressed to:

<type commands for a list> say wat

a man says, 'What do you mean, "wat"???'
You say, 'wat'

Clearly the order of responses has changed somehow. I assumed this was because the code that delivers the evento does so in an arbitrary order. In the following snippet, the Mob, when it wants to say something, has the room have everyone in it hear the evento.

self.room.receiveEvento(SayEvento.new(self, text))

It turns out that the room is allowed to deliver the evento to its listeners (the mobs standing in the room) in whatever order it wants. Consider the following, obvious implementation of EventoSpeaker::receiveEvento, which is behind the previous piece of code.

def receiveEvento(evento)
    results = Hash.new()

    self.eventoListeners.each() { |listener|
        result = listener.receiveEvento(evento)
        results[listener] = result
    }
    ...

If eventoListeners here is a Set (which it often is) then its elements are ordered arbitrarily, so iteration may go in no relevant order. I took this to be the main cause of the failure, so I changed the implementation here so that the source of the evento is always first in the iteration. This did not fix it; now the source is guaranteed to see the evento in the correct order, but from a bystander's point of view, the incorrect ordering may still take place.

The actual nature of the bug is the fact that the call to receiveEvento may itself produce an evento in response. For instance, a nipic would respond with its own say evento -- the reply you see in the MUD output above. This happens inline during the delivery of the first evento, and completes its delivery to all present in the room before the first message has been delivered to everyone.

Try 1: add some threads

The way I tried to fix it was by delivering the first evento to everyone at the same time, and having the second evento block until the first completes. Here is the actual implementation, which uses threads.

mutexResults = Monitor.new()
listenerThreads = []

self.mutexEventoSpeaker.synchronize() {
    self.eventoListeners.each() { |listener|
        listenerThreads << Thread.new() {
            result = listener.receiveEvento(evento)

            mutexResults.synchronize() {
                results[listener] = result
            }
        }
    }
}

listenerThreads.each() { |t|
    t.join()
}

Now the first command will be delivered to everyone in parallel. Those who produce reply eventos that would go to the same room will have to acquire the room's mutex (mutexEventoSpeaker) first, and will therefore block until the original deilvery has completed.

Doh, that's not quite good enough

Unfortunately, this isn't enough to prevent all kinds of problems. While this method is, I think, correct for the situation I just described, it doesn't work if the reply evento goes to a different target than the same room. Consider if the nipic instead broadcasted its reply to the entire MUD. Since the whole MUD has a different evento speaker, that lock will be free and the reply will be delievered before the original evento.

There are a few ways by which I could fix that. One way I could do it is with more locking. I could make every evento listener have its own mutex. Then, whenever anyone wants to deliver an evento to someone, he first acquires the target's lock before delivering. If you've done any thread programming you know that this isn't the whole story. Actually, I would have to impose a total ordering on all the evento listener mutexes, to prevent a deadlock. Since I would like to leave the design open to pretty much anything being an evento listener (even dealies), this means I could have hundreds or thousands of mutexes, and acquiring one would mean acquiring all of the previous ones in the total ordering. I don't believe in premature optimization, but I get the feeling this solution is doomed from the start.

The simplest way I see this working is to force replies to take place on the next pulse. Eventos are delivered in the first pulse, and all replies are delivered in the next pulse. This means that there would need to be a queued and non-queued type of delivery. Actually, let me just go implement this right now while it's fresh in my mind, since it seems like a decent idea.

Try 2: Ahh, that's much better

OK, back. Man, I feel better now. Now I set a flag at the start of receiveEvento that says that this listener is in the process of receiving an event. If someone tries to sendEvento during this time, they will end up calling queueSendEvento rather than delivering immediately.

At first, this kind of flag-setting seems at risk from multiple callers in parallel, but it turns out the MUD is only multithreaded in certain ways and at certain times. I can guarantee that receiveEvento above will never be called in parallel, because this part of the MUD is driven by a single thread that "pulses" every subscribed object on the mud serially.

For the queueing, I decided to go back to the MainInteractionMode well, which has this queue pretty well implemented already, and also does a set of actions every pulse. I just added a new invocation type that sends an evento as specified.

<type commands for a list> say wat
You say, 'wat'

<type commands for a list>
a man says, 'What do you mean, "wat"???'

a guy says, 'What do you mean, "wat"???'

Ta-daaa. Oh look, it looks like our friend, "a man", has a new buddy.

Digression: blogging finds bugs

Part of the reason I write these entries is for documentation about choices I made, since it's not necessarily appropriate to go into some of the details in comments in the code or in changelist descriptions. The other reason is that this kind of narrative builds an internal dialogue that makes me think of potential problems in the design or potential solutions.

It's like--have you ever been stuck on a problem, but as you start to explain the problem to someone else, you find the solution? This happens at work all the time, but since I don't really have a human being here that I can talk to about this, I talk to myself in writing.

Of course, the reason I even bring this up is that I found the nicer solution to this post's problem in the course of writing it. That's also why the writing here is a little bit rambling. At times it's almost a stream of consciousness. After I've finished and published the post, I give it a once-over and try to smooth out some of the rough edges and clarify things that I found might only be obvious to me.

0 comments:

Post a Comment