20090630

Target Strings

I haven't posted substantially in a while. I got a bit busy with school and house stuff and messing around with the Ruby bugs I ran into before. Oh yeah, and I did some random bugfixes and cleanup. I don't have a lot to report, but I did make some updates to target string processing.

Target strings are a way of identifying a thing in the MUD world. You need to do this all the time. To look at someone, pick up an object, put an object in some container, attack someone all use target strings to identify the thing in question. I've somewhat copied the style of target strings that I'm familiar with from ROM.

A [key]word about keywords

I've talked about keywords before, but it's been a while, and this post involves them a lot, so I'll say a bit more.

Each thing in the game world has a set of keywords that are used to identify it. The player never sees these words, but the implicit contract is that at least some of the keywords can be gleaned from the short description or long description.

KeywordsShort DescriptionLong Description
sandwich ham cheesea ham and cheese sandwichA ham and cheese sandwich sits here, tempting you.
businessman man suita well dressed, important looking fellowA man wearing an expensive suit and wielding a briefcase stands here looking important.
amethysta beautiful purple crystalGlinting in the light, a light-purple gem is lying here.

Notice that there is no requirement that the keywords are easily guessed given the short and long description. Sometimes level builders like to play these kinds of games, though as often as not it can be seen as bad form. As an aside, keywords are unordered; in fact, internally they are stored as a Set.

So why all this nonsense about keywords? Well, for fun, but also for notation. In the following section I will represent a set of keywords as {keyword1, keyword2, ...}.

Target Strings

As I mentioned before, target strings are a way of identifying something when you enter a command. Target strings operate on keywords only, never long or short description. When I say "matches" below, I mean "prefix matches." So "man" matches {man} but also {manticore}.

Examples:

man
a thing in this room where one of its keywords matches "man"
2.man
the second thing in this room with a keyword matching "man". If there is only one such thing in this room, then this is not found.
'man guy'
the first thing in this room that has a keyword that matches "man" and a keyword that matches "guy". E.g. matches {cool, guy, man} but not {cool, man}.
'man guy
same thing. NB: no closing quote
all
a special identifier that indicates the group of all things in this room. This will actually have a different set of things matched for each command that operates on it. For instance, "get all" means to pick up all dealies, but "kill all" means start attacking all mobs.
all.
same as "all"
all.knife
all the things in this room that have a keyword matching "knife"
all.'sharp knife'
matches all the things in the room with a keyword matching "sharp" and a keyword matching "knife". E.g. matches {sharp, blade, knife} but not {dull, knife}.
self
a special identifier that always refers to my own mob
1.
the first thing of any kind in this room
'
same as 1.

Once you have the target string and the set of keywords, doing the prefix matching against the set of keywords uses the obvious approach.

argify

Each command takes zero or more arguments. For example, "look" can take zero arguments, which means to look around in the current room, or it can take one argument, a target string indicating the thing to look at. "Get" takes one or two arguments: the item to get, and optionally the container to get it out of (both target strings). "Tell" takes two arguments: the person to tell something to, and the string to send to them.

The process of turning an input string into an array of arguments to pass to the command is called (by me and the code) "argification," because the function that does it is called argify. The caller passes in the string to parse and the maximum number of arguments it wants. argify gives back an array of at most that many elements.

So how would you go about parsing input into arguments? Here are some of the rules:

  1. Arguments are space-delimited ("one two" => ["one", "two"])
  2. Quoted strings become a single argument ("'one two'" => ["one two"])
  3. Quoted strings in the middle of an argument are assimilated into that argument ("1.'one two'" => ["1.'one two'"])
  4. If the string is too long for the max requested items, everything at the end is bundled into the last argument ("a b c d" with max 3 args => ["a", "b", "c d"])
  5. A non-terminated quote means to take all of the rest of the string as part of that argument ("'hello goodbye hello" => ["hello goodbye hello"])
  6. Anything else is probably undefined.

Do you scan through the string, keeping track of whether you've seen a quote and are waiting for an end quote? That solution is tricky for inputs like "a'b'c'" (I mean, what should that even do? Probably just give back the entire string in the first argument). So then you need to track quotes only at certain points. Which points? Word boundaries? When preceded by whitespace? What if the string starts with a quote? I'm asking all these questions because these are all the bugs I found in my original implementation of argify, which tried to use a StringScanner and a single pass through the input.

I eventually abandoned that plan in favor of splitting the string by whitespace and combining arguments together that are part of the same quoted string or are at the tail end of the string.

Even this has a few caveats, though. Consider this naive way of splitting (warning: assumes familiarity with regular expressions):

tsp = text.split(/\s+/)
#...
# join arguments back together that belong together

This doesn't work when an argument needs its whitespace preserved, like this command: "tell man hello       mister". By the end, you will get ["man", "hello mister"] instead of ["man", "hello       mister"]. The trick is to use some special regex magic:

tsp = text.split(/(?=\s)/)

This is called a "zero-width lookahead." It matches the thing in bold without actually consuming it, so the items in the array after the split may have leading or trailing whitespace, which I can then trim off.

Who cares?

This may seem uninteresting to you, but there's one aspect of it I find interesting. When writing progams that accept input of any kind, if possible it is usually desirable to accept that input in a very strict way. In fact, that was a key design goal of XML. Unlike HTML, if your XML document is malformed, a conformant XML parser will reject it outright. HTML parsers, on the other hand, sometimes go to great lengths to guess what you meant when you fed it something that makes no logical sense (e.g. <b><i>overlapping end tags</b></i>).

With target strings, a key design goal is that they are relatively lax. They serve a dual purpose: firstly, to unambiguously identify a thing; and secondly, to give a shorthand for identifying a thing. I was faced with decisions around questionable inputs. For example, I decided to allow an unterminated quoted string to grab all input until the end of the line. I had to make sure that an opening quote running till a proper closing quote is captured properly into a single argument, but pathological cases like "a'b'c d'" are captured as two arguments. Actually, that last one probably falls into the undefined category.

At this point, I think I've implemented most or all of anything I ever used when I was actively MUDding. I wonder if I will get requirements for something more elaborate eventually.

0 comments:

Post a Comment