20100122

The Great Logging Worker Thread Adventure

In my previous entry, I talked about some ways I tried to increase performance or responsiveness of the MUD. One of the ideas I talked about was putting my logging functionality into a separate thread. I went through a few iterations of it before I finally arrived at a solution that is both fast and preserves the functionality I want. One of the intermediate solutions also had a really cool idea (if I may say so myself), and I want to share that/document it.

It's funny: I was going to call this post "the great logging worker thread fiasco"... but then I managed to make it work, so I had to change the title. I guess that's a good thing.

The justification for doing this was to increase responsiveness. If the logging code is synchronous/blocking, then whenever I log something, the user has to wait for that logging to happen. The user doesn't care! They just want to play. In fact, they don't even know it's happening; all they know is my MUD sucks because it's dog slow.

So I wanted to use threading to handle this in the background, i.e. at a lower priority. It could happen whenever it was convenient. More so, I thought of using a worker thread. Obviously, that link is for C++ and Windows, but hopefully I'm not abusing the term for the idea I'm thinking of. The key point is that the logging worker thread doesn't exit, but instead hangs around waiting for new work to arrive. This avoids the cost of creating and destroying a thread.

Actually, Ruby threads are supposed to be "green threads", not kernel threads, so maybe this particular aspect doesn't really matter. I could take some profiles to verify that. In fact, the fact that these aren't hardware threads could potentially make this entire idea a non-starter. I'd have to do some simulations to verify that too.

Let's first consider the naive implementation.

def DebugOutput.debugOut(msg, cats = LComment)
    trace = caller(1)
    enqueueAction() {
        debugOut_internal(msg, trace, cats)
    }
end # end function debugOut

def DebugOutput.workerThreadProc()
    isRunning = true

    while isRunning
        toLog = nil

        @@mutexEventDoLogging.synchronize() {
            isRunning = @@isRunning
        }

        @@mutexEventDoLogging.synchronize {
            @@condDoLogging.wait()
            toLog, @@toLog = @@toLog, []
        }

        toLog.each() { |prc|
            prc.call()
        }
    end
end # function workerThreadProc

I've skipped some of the innards here that perhaps you can imagine. debugOut is the producer, if you will, and workerThreadProc is the consumer--the thread that sits around (the wait call) and does the actual printing (the prc.call() call). What's wrong with this code? Well, one major and a few secondary problems.

To see the major problem, we need to look first at what the logging output actually looks like.

connec@20   accept              Waiting to accept a connection.

There are four parts here: filename ("connection.rb", truncated), line number, function name (accept), and the actual message being logged. The first three of these are generated automatically in the logging code. It's useful to have, to quickly see in the log file where a message was logged from to go look at the code in that file. Herein lies the problem, though.

Look back up to the code and notice the caller(1) call. It turns out this is really expensive (like 80% of the time spent in the logging code). You might also be able to tell that this work isn't deferred in the worker thread; it's on the main thread before the enqueueAction call. The unfortunate fact is that this is necessary. The caller function grabs a full stack trace at the point where it's called. I use that information to parse out the file, line, and function name for the logs. One can perhaps imagine how that might be expensive. And, since it grabs the stack at the point where it's called, it must be done on the main thread; on the worker thread, the stack will just look like the worker thread's stack.

Try Continuations

So I got to thinking, how can I continue to make the correct caller call, but do it on the worker thread, so it doesn't get in the way? In my recent programming languages course, I learned about something very cool that Ruby 1.9 got: continuations. The "continuation sandwich" description in the wikipedia article is pretty cool. Let me summarize the idea for those too lazy to click on the link. A continuation is basically taking a snapshot of the execution state at some time and storing it in a variable. Later, you can jump right back to that point and resume execution from the next line. But it's not really time travel; everything you've done in the meantime is still persisted.

To apply continuations to this problem, roughly the following steps would take place. I'm not altogether clear on whether this is an example of a coroutine.

  1. On the main thread, create a continuation at the point where the caller call should happen, without actually calling it.
  2. Enqueue the logging call as usual, also passing in the continuation object.
  3. In the worker thread, create another continuation to mark the current place.
  4. Call the first continuation, also passing in the second continuation.
  5. Now we're back in the code main thread (but hopefully still running in the logging thread). It's set up (see #1) so that right now we call caller.
  6. Having procured the caller information, now the main code invokes the second continuation, passing the caller information too.
  7. This moves us back to the worker thread proper and the normal flow of execution.

So yeah... that was confusing. Maybe I should just post the code and annotate it with numbers? OK.

def DebugOutput.debugOut(msg, cats = LComment)
    cont = callcc { |cc| # 1
        enqueueAction() {
            debugOut_internal(msg, cc, cats) # 2
        }
    }

    if cont.kind_of?(Continuation) # 5
        trace = caller(1) # also 5
        cont.call(trace) # 6
    end
end # end function debugOut

def DebugOutput.debugOut_internal(msg, cc, cats = LComment)
    trace = callcc { |cont| # 3
        cc.call(cont) # 4
    }

    # 7
    # write logs using trace
end # function debugOut

The callcc function creates a continuation. It takes a block, in which you receive (via the block's parameter) the continuation object that was just created. The return value from the callcc call is not the continuation object that was just created! The return value is whatever value is passed to continuation.call later. For example, the return value from callcc at point #3 is the variable trace that was passed to cont.call at point #6. The syntax is maddening, but it works.

So if you can puzzle all this out in your head (it took me a while), it's pretty rad. Too bad it just flat out doesn't work. Oh, it works on paper, and even works in the single-threaded case, but if you try to call a continuation from a thread other than the one on which it was created, Ruby just throws an exception saying you're not allowed to do that.

The plane was full of Miracle Whip, but it never left the hangar.

Try __callee__

I was looking through methods and noticed that Kernel exposes a method called __callee__ that tells the current method being executed. So if every call to debugOut included the value of this method at that point, it would at least get me the function name that a logging call was executed in. The problem is, I really don't want to update every single call site with an extra parameter that is always the same code.

I was pretty sure this wouldn't work, but I gave it a try anyway: I put __callee__ as the default value for another parameter to debugOut, hoping it would get executed in the context of the caller, not the callee (debugOut).

def DebugOutput.debugOut(msg, cats = LComment, callee = __callee__())

Nope, this always gets the value debugOut. Oh well.

Think about macros

I started thinking a bit about C++, and the way I'd do this there. We actually have an exception library at work that does have this feature of printing out the site that threw the exception. To make it work, we used the __FILE__ and __LINE__ macros.

There are macro processors for Ruby out there. For instance, rewrite looks interesting. But it doesn't seem to be under active development, and there is some question about its speed. It also has some external dependencies, and after last time, I don't feel like putting myself through that pain just yet.

Try Proc.source_location

At some point along the way, something I'd seen tickled my mind. I remember printing out a Proc object and noticing that it showed the file and line number where it originated. Surely I can programmatically get a hold of this information. Indeed, I can! It's source_location.

But to get at this, I'd need a proc that was created at the right place. We've already established that I can't create an empty one in an optional parameter to debugOut. And the prettiness of the code matters, so I'm not going to create that by hand everywhere.

Instead I decided to turn the log message into a proc itself. Check this out:

# Before:
def DebugOutput.debugOut(msg, cats = LComment)
    writeLogs(msg)
end

# After:
def DebugOutput.debugOut(cats = LComment, &procMsg)
    writeLogs(procMsg.call())
end


# Before:
DebugOutput.debugOut("called this crazy function", LComment)

# After:
DebugOutput.debugOut(LComment){"called this crazy function"}

See, this is syntax I can live with. I don't mind curly braces too much, and this has the added benefit of deferring the evaluation of the string until the proc is called... I think. Best of all, this works. I had to change every single call to my logging to fit the new syntax, but it's great. I'm down to 5 sec in startup time, and the majority of that is spent actually interpreting the source code, not in actually executing MUD code.

Minor problem: synchronization

Way back above I alluded to minor problems with the naive implementation. It turns out that all the synchronization and waiting and signaling I'm doing here is slow. When I eventually did get this threading approach working, I found that it sometimes accounted for over 50% of the time. So I removed it. It's just logging, after all; if I lose a couple log messages I highly doubt I'll miss them.

After removing it, the code for the worker thread and the code to control the worker thread is really simple. You don't want to see what it was before, when I had to avoid pesky deadlocks and stuff.

def DebugOutput.startLoggingThread()
    DebugOutput.ASSERT(!@@thDebugLogging, "trying to start logging thread twice")
    @@isRunning = true

    @@thDebugLogging = Thread.new() {
        DebugOutput.workerThreadProc()
    }
end # function startLoggingThread

def DebugOutput.stopLoggingThread()
    @@isRunning = false
    @@thDebugLogging.join()
    @@thDebugLogging = nil
end # function stopLoggingThread

def DebugOutput.enqueueAction(&procLog)
    @@toLog.push(procLog)
end # function enqueueAction

def DebugOutput.workerThreadProc()
    while @@isRunning
        sleep(0.5)

        toLog, @@toLog = @@toLog, []

        toLog.each() { |procLog|
            procLog.call()
        }
    end
end # function workerThreadProc

One big change is now that instead of being notified that there is work to do, the thread wakes up periodically to check for work (the sleep call in the loop).

I guess the moral of the story is that sometimes you have to try tons and tons of crazy stuff until you finally hit on something that actually works.

0 comments:

Post a Comment