20160904

Into the Blue Clouds

It's nice to get feedback from your peers. I made a thing a while back, but I didn't tell many people about it because I had reservations. I solicited feedback from my peers about it, and I decided the thing I made had problems I needed to resolve before I felt good about releasing it. Well, I've now solved those problems, so I'm ready to tell the world about the thing I originally created, the feedback I got, and how I incorporated it to make something neat.

What is a "fight club tracker"?

I play a lot of Dark Souls, and I have for years. Why wouldn't I? It's one of the best games ever made. In the competitive aspect of the game, people like to duel, and streamers like to host duels for spectator enjoyment. We call these "fight clubs", after the excellent movie.

What if you were hosting a fight club while streaming and wanted all the people watching to be able to see the running statistics of who's winning and by how much? Well, I tried to answer that question by making a web page to help track it. Here's what it would look like in action: a buddy of mine hosted a fight club and kept track of the stats.

Not bad, but what's up with the atrociously long URL? And if you change anything on the page, the link regenerates! How can the viewers in the stream keep up with that? Short answer is they can't, and that was a big problem with v1 of the fight club tracker. It's only half-usable and barely worth mentioning, though the ugly (I mean, minimalistic) UI will carry forward to the later versions.

Let's keep us all in sync, okay?

I made v1 of that thing probably in 2013, a long time ago. Then I forgot about it for a while. I don't know what tickled my brain on Oct 16, 2015, but that's when I started working on v2 that kept all the viewers in sync and updated in real-time.

This was a much larger undertaking than the v1 version, which was implemented entirely on the client side (within the browser with no help from a server). Now this needed a server-side component, something to store the ongoing results of the fight club, so that, without updating the URL, spectators would automatically pull the most recent data and keep current.

I have very simple web hosting with 1 and 1. I store some files on it and do some subdomain redirects, but it also gives me a LAMP stack. I don't really know PHP, but I can use Ruby to fulfill the same role easily enough. So I have a MySQL database, a Ruby server, and good ol' HTML + Javascript for the client (same as v1).

I wasn't kidding about my web hosting being simplistic; I only have CGI available. That means I can't really maintain a long connection between the client and the server; I'll have to keep making HTTP requests to change something or get the updated state, and each one will have to start a new connection to the SQL server that lasts only for that request. And how will a viewer be notified that there's a change to get? Well, there's basically no way to do this well with CGI, so they'll have to keep polling the server every once in a while to grab the latest stats.

All in all, this is a pretty bad solution for server load, but it does end up working. I made a Ruby script that parses the HTTP method and query string out of the REQUEST_METHOD and QUERY_STRING parameters, respectively. When a request comes in, it makes a new connection to the MySQL database, runs the appropriate query, and returns the results as JSON to the client. I would say it's, um, RESTful in spirit.

I was worried about people filling up my database, so I added some rudimentary rate limiting, too. There's a pool of "tickets" for each fight club, and you need a ticket to be allowed to write to the database. Every time you try to write, a ticket is consumed, and they recharge at a fixed rate, up to a max, effectively 60 writes per minute. If you exceed this, I don't even make a connection to the database. There's also a global pool, so even if you are within the per-fightclub ticket limit, you may be limited by overall load across all fight clubs. Pretty rudimentary, and there are holes in it.

Don't you dare release this

So I got all this working, horray! My first web app. I knew about the theory of how web apps should work, but had never actually tried to build one that had any degree of complexity (and this one barely qualifies). I sent it to some of my friends, who gave me a little feedback, but I was hesitant to release it to the world, because I was worried about my web hosting getting slammed or hacked in case I had some bug. I decided to get a code review for it first.

I managed to find the perfect person to review my code: Naiyt, the maintainer of the MugenMonkey Dark Souls planners. He's in the Dark Souls community and has a lot of experience building web apps used by a lot of people. I sent him all my code and an overview of how it works. He sent back some good suggestions about the overall design:

  • The whole CGI execution of a new Ruby instance on every request and associated new database connection on every request is a lot of overhead for each request. I can guess it would be bad if the site were to somehow become popular.
  • My rate limiting scheme in this version used a file on disk and flock() to synchronize writes to it, for tracking the available tickets. My thinking was I should have some way of checking tickets before even connecting to the database, but he pointed out that lock contention around this file may even contribute to other performance problems. Without load testing it, I wouldn't know for sure, but it's a fair consideration.
  • My rate limiting scheme at the time only had a single global bucket. He rightly pointed out that this means one troll could lock out legitimate users. I switched it after that to one ticket pool per client IP address, which is checked before the global ticket pool. Then a troll would get stopped without interfering with others.

    Let's just ignore that CGNs/LSNs might let one troll lock out others in his geographic vicinity. I don't know how to solve that problem.

Even with a technically working product, given all that feedback I was hesitant to release it. I let it cower in obscurity for a while, then started working on a new version with some new ideas.

Paradigm and platform shift

Something important happened while I was letting dksfight snooze: I learned about the wonders of NodeJS, and made something with it. Something silly, to be sure, but it opened the door to try out other uses. I'd heard about NodeJS being used as a web server, so I thought, "why not try it out as a web server for dksfight?"

So I deployed it on my web hosting and got to work.

Wait, that's not at all what happened. There's no way I'm installing it on my web hosting--far too janky and they wouldn't let me open my own port for a permanently-running process. No way.

Instead, I looked into Microsoft Azure. As a Microsoft employee, one benefit I get is $150 of Azure credit per month. I figured I might as well put that to use. Also, Azure fulfills my requirement of taking care of maintenance of the environment--I just declare what NPM packages I want, and Azure handles keeping it up to date with security patches and so on. It's isolated, so if I get hacked, I shouldn't lose any other assets. If I get DOS'd or get too heavy load, I won't get charged; it'll just run out my allowance and shut down my site. This is all ideal for my usage.

I started researching Azure's capabilities with NodeJS. It turns out it's pretty much a first class citizen there, which is awesome. Clearly, Microsoft is trying to make it seamless and easy for developers to take technologies they already know and integrate them with Azure. I don't know how this compares with Amazon Web Services, but it worked pretty well.

For the benefit of people who haven't used Azure before, I'll run through some of the stuff I had to do and stuff I learned along the way.

The center of your web site is a Web App. I think it's basically some IIS rules on some virtual machine in a datacenter somewhere. IIS gets configured to forward requests to a NodeJS instance that it spins up automatically. The PORT environment variable tells your NodeJS process which port to listen on. You declare your NPM package dependencies in your package.json file. To deploy to Azure you set up a remote Git branch. When you push to the branch, it runs deployment scripts automatically to install the NPM dependencies. It's surprisingly simple.

Speaking of package.json another thing you can specify is the NodeJS version to use. To find out which versions of NodeJS and NPM are supported, you can look at the "Runtime Versions" link from the amazing Kudu site inspector available at https://YOURSITE.scm.azurewebsites.net.

Pruning branches

One problem I ran into with this flow is how to separate what I keep in revision control with what I deploy to the cloud. Really, the only files I need deployed are server.js, dksfight.html, web.config, and package.json. I have several other files, like server.ts (the original TypeScript source that produces server.js) and todo.txt (my higher-than-high-falutin' bug database), which I don't want to deploy to the web site even if there's no URL to access it.

So what do I do? I make a new child branch and merge in only the files I want deployed. Every time I merge from the parent branch, I delete any unneeded files. This feels like a hack, but none of my friends could tell me a better way to do it with Git.

A basic web server in a not-so-basic cloud

There doesn't seem to be consensus on any one right way to do a lot of things in the web app space (big shocker), for example even which web server to use. Eventually I found that Express is one of the most popular ones. It also works nicely with socket.io and Let's Encrypt, which we'll get to later.

I only have one file to serve, so the code to start using Express is basically one line, just telling it to send / to /dksfight.html.

Connecting to a database

The other part I needed was a database to store the results of the fight clubs for posterity. Azure provides fairly easy connection to a Microsoft SQL Server database. With a few clicks, you can generate the connection string that includes the password to the DB, which you will access from the web app.

Just like the web server, I had to choose a NodeJS package to use for accessing the database, and just like before there were many choices. I ended up using mssql because it seemed popular, mentioned Azure on its page, and had Typescript typings available for it readily. Well, those bindings were somewhat incorrect or out of date, but close enough.

mssql uses tedious as the "driver", the thing that actually connects to the database and speaks the TDS protocol that MS SQL server understands. mssql is more like a wrapper on top of tedious.

By default, Azure gives you a "connection string" that you can pass straight to mssql. It looks something like this: Driver=tedious;Server=tcp:DATABASENAME.database.windows.net,1433;Database=DATABASENAME;Uid=USERNAME;Pwd=PASSWORD;Encrypt=true;TrustServerCertificate=false;Connection Timeout=30;. There are a few problems with this.

The first thing I ran into right away is the connection timeout. If you read that connection string, you'd assume the unit of time is probably seconds, but you'd be wrong; it's milliseconds. So my first few attempts to connect to the database ended in flames very quickly. I don't know which component is interpreting it in the wrong units, but it's annoying when just getting started.

The connection string also doesn't specify any of the optional parameters you can pass to mssql or tedious, things like whether to turn on debug logging. Later on I stopped using the connection string and instead passed all the individual parts manually:

let config : sql.config =
{
    driver: process.env.SQLAZURE_driver,
    user: process.env.SQLAZURE_user,
    password: process.env.CUSTOMCONNSTR_dksfight_pass,
    server: process.env.SQLAZURE_server,
    port: process.env.SQLAZURE_port,
    database: process.env.SQLAZURE_database,
    connectionTimeout: process.env.SQLAZURE_connectionTimeout,
    options:
    {
        encrypt: true,
    },
};

The reason password is in a slightly different environment variable format is Azure hides "connection strings" variables by default in the UI, because they often contain passwords. But any connection string variable gets prefixed with something like CUSTOMCONNSTR or SQLAZURECONNSTR. It's a bit goofy.

My first SQL query

After resolving the differences between the old MySQL code and T-SQL, I was ready to actually start issuing queries. It's not my first SQL query by a long shot, but the first one in this new world of Azure, so that counts for something, doesn't it? mssql gives a few ways to do a query, primarily requests and prepared statements.

Hey, I recognize prepared statements from the MySQL version! They're good because they let you sanitize input values to make sure you don't have SQL injection attacks. I should use those! And wait, I better put things in transactions, too, because I can have multiple clients reading and writing from the database at the same time. I'll sprinkle those on there, too. From this was born the monstrosity that is doQueryInTransaction.

function doQueryInTransaction(sqlTransaction : sql.Transaction, query : string, params : PreparedStatementArg[], shouldCommit : boolean) : Promise<ExecuteResult>

Prepared statements are slow

The golden path of this function is just fine, but a little more complicated than you might expect. You see, to use a prepared statement is a four-step process. What!? The steps are input, prepare, execute, and unprepare. What the heck is unprepare?? Why do I need all that?

The thing I didn't understand about prepard statements at the time is that they're really intended for executing a single SQL query many times with the different data. The prepare call actually turns into a sp_prepare stored procedure call on the database that results in a stateful handle that is later used in sp_execute and sp_unprepare calls. So that's three round trips just to execute a single SQL query. It makes sense for bulk-inserting data, maybe, but definitely not for my usage.

I switched eventually to just doing a regular request, which, as it turns out, also has input sanitization available with just a single round trip. The web app got much faster just with that switch.

Why stop at one request?

At some point after switching to regular requests instead of prepared statements, I noticed that mssql has a feature that lets you do multiple requests in a single round trip. I was all about eliminating round trips, so I started consolidating requests whenever possible. For example, when fetching the fight club info, there are two distinct queries to do: get all the rounds, and get all the upcoming fighters. I can combine these into a single request that comes back with multiple record-sets.

All you have to do is specify request.multiple = true, and in the response callback, you have an array of recordSets, one per query. They also share the input parameters. I'm going to omit showing the code because it's kind of a large block of SQL.

Callbacks vs Promises

mssql supports two different calling patterns, simple callbacks and ECMAScript 6 promises. Promises are little more than a callback codified as a function that's called in success and a function that's called in failure, but there are some useful things you can do with them, such as wait for all or any promises in a list to complete.

One of the operations needed against the database is called ensureFighter. When adding a new round to a fightclub, the user enters two names of people who are fighting. The database has to store those names and generate unique IDs for them if it's never seen them before. Since a round is between two people, it stands to reason that you need two independent ensureFighter calls.

When I started the project, I was using mssql with simple callbacks, but when I had to write ensureFighters it seemed like a good use for Promise.all, so I switched all of the code calling mssql to use promises. It was a pain. Really gnarly syntax and code, especially in the error paths, but it let me write this one tragically beautiful block of code:

let fighterAId : number = null;
let fighterBId : number = null;
let resolveFighterA : Promise<number> = ensureFighter(sqlTransaction, fighterA);
let resolveFighterB : Promise<number> = ensureFighter(sqlTransaction, fighterB);

Promise.all([resolveFighterA, resolveFighterB]).then(
    function(fighterIds : number[]) : void
    {
        let fighterAId : number = fighterIds[0];
        let fighterBId : number = fighterIds[1];
        // ...
    }

That code looks deliciously asynchronous in the abstract, but by subtle twist it's totaly serial: both queries are in the same transaction, and so they execute over the same connection and one after the other. The asynchronous complexity of that code has no purpose to it, sadly. Maybe the only value is it expresses that there is no ordering requirement between the two. That's pretty low value, if you ask me.

Stay tuned for why this code got totally blown away, and Promise.all was no longer used at all. Like I said, tragic.

The bugaboo that is set_upcoming_fighters

Once I had doQueryInTransaction coded up, most of the operations you can do on a fight club (add, update, and remove rounds, create fight clubs, change password) fell into place pretty easily. But another feature of the page lets you enter people who haven't fought yet but are coming up soon, so you don't forget about them. It's basically just an ordered list of fighter names. When a person fights, they are automatically removed from the list.

This feature was originally implemented as set_upcoming_fighters. That is, every time an update was needed to the list (someone added to the list or someone fought, so should be removed), the front-end would pass the entire new list to the server. Seems kind of dumb, right? In almost all cases, there's only one delta happening: one fighter added or removed. But what if you have two people in the list, and both of them fight in the next round? Now you need to remove two from the list at once.

It's hard to say whether passing the whole list or chaining multiple updates together is more complicated. The main advantage is a single roundtrip to the server for any update. Though, a downside is the payload on that roundtrip is a little larger.

Another problem is that, from an API cleanliness standpoint, a method called set_upcoming_fighters should support setting any arbitrary list of fighters, even if the usage is always setting the existing list with small changes. That means I need to be able to populate the database with an arbitrarily long list of never-seen-before fighters all at once. That is, it might really put ensureFighters to work.

ensureFighters(sqlTransaction, upcomingFighters));

I made the mistake of plowing ahead with optimizing set_upcoming_fighters rather than thinking again about the tradeoffs above. I made ensureFighters work with an arbitrarily long list of new fighters. It first looks up the whole list and only tries to insert names that aren't in the database already. There might be a race condition if someone else is also inserting the same name, where one of the INSERT queries will fail, but it's a low risk and can just be retried.

Safe, dynamically generated SQL in ensureFighters

The way ensureFighters handles the arbitrary list of names to insert is with a little dynamicallly generated SQL. Wait, don't run away! This is safe, I swear!. The query to insert a bunch of fighters looks like this:

INSERT INTO fighter (name)
OUTPUT INSERTED.id, INSERTED.name
VALUES
  (@fighter_name0),
  (@fighter_name1),
  (@fighter_name2),
  (@fighter_name3),
  (@fighter_name4),
  (@fighter_name5)
;

And then I supply the fighter_nameN arguments with the actual names, which protects against SQL injection for each name, since I'm not dealing directly with user-entered data, only the number of inserted items. This might have been a decent place to use a prepared statement, but that's still N round trips rather than 1. I'm willing to take the risk that nobody is going to insert so many names at once that the query will break. In fact, later I'll show how I ensure it.

Back to set_upcoming_fighters - bulk table updates

So now that I have our fighters populated in the database, I can actually do the set_upcoming_fighters operation. This is a DELETE query on all upcoming fighters associated with a given fight club followed by a bulk insert of the new list. mssql has "bulk insert" functionality that I think is fewer than N round trips to insert a bunch of data. This is basically what it looked like:

let tableForInsert : sql.Table = new sql.Table('upcoming_fighters');
tableForInsert.create = false;
tableForInsert.columns.add('fightclub_id', sql.Int, {nullable: false, primary: false});
tableForInsert.columns.add('fighter_id', sql.Int, {nullable: false, primary: false});
tableForInsert.columns.add('exclude', sql.TinyInt, {nullable: false, primary: false});

for (let i : number = 0; i < upcomingFighters.length; i++)
{
    tableForInsert.rows.add(fcid, upcomingFighters[i].id, upcomingFighters[i].exclude ? 1 : 0);
}

let sqlBulkRequest : sql.Request = new sql.Request(sqlTransaction);
return sqlBulkRequest.bulk(tableForInsert);

I'm a huge believer that writing about your code helps you find bugs, especially design bugs. Sure enough, in writing this huge post I gave some extra thought to set_upcoming_fighters and decided it would be better to move it to doing only individual operations: addUpcomingFighter, updateUpcomingFighter, and deleteUpcomingFighter.

With this change I could remove the bulk table insert above and be guaranteed ensureFighters would never get too many fighters at once.

Save more round trips by caching

This continuing quest to save as many round trips to the database as possible probably happened because I did most of my development on a really slow network. To test with Azure, my own laptop established a connection directly to the Azure SQL database, and each request was correspondingly really slow. I'm talking on the order of multiple seconds for each SQL query. Hoo boy.

So to speed things up for the sake of my development sanity as well as the long term, I added a little caching. When someone successfully fetches the info for a fight club, I cache it in a global map using the fight club name (which is what the client uses to look it up). I also cache all the fighter name lookups, so ensureFighters can skip trips to the database if it has all the names cached.

But what if the site gets popular? What if the server never crashes because my code was just that awesome? Those global maps might get cluttered and kind of big, filled with old fight clubs that nobody's looking at anymore. So I attach a "last access" timestamp to each entry in the cache and periodically scavenge them to remove entries that haven't been used in a while.

Be careful about leaking transactions

One of the last and most important things I fixed in my SQL query code has to do with always completing transactions that I start. For example, if a new connection comes in and tries to join a fight club as an editor, the server has to do two things: check to see if the user's password is correct, and then fetch the fight club's data.

Recall the sequence of steps for a transction is: begin, then some number of requests, then a commit or a rollback. Until the commit or rollback happens, the transaction stays open and a database connection from the pool is reserved for it, so it's imperative that no transactions are left in this open state, or else the connection pool will dry up, and new requests will hang.

In earlier versions of my code, doFetch was always the final step in any successful request: at the end of any change or any update, you fetch the fight club info and commit the transaction. But over time several codepaths emerged where you finish a logical request without fetching fight club info: what if you fail the password check? or if the fight club info is retrieved from the cache? or if you're only changing the password? In all of these cases, doFetch isn't needed, so transactions were left hanging around. It was bad stuff.

I instituted a new principle and made all the code adhere to it: if any part of the code calls transaction.begin() it is also responsible for calling commit or rollback. It seems straightforward, but with ES6 Promises and the error handling codepaths, it's easy to miss this. Here's an example of what I did:

lookupFightClubInfoByName(sqlTransaction, fcName).then(function(fcInfoFromLookup : FightClubInfo) : Promise<void>
{
    if (fcInfoFromLookup.isCached)
    {
        sqlTransaction.commit();
        return Promise.resolve();
    }
    else
    {
        return sqlTransaction.commit();
    }

In this case, I had to add an isCached property to FightClubInfo so that lookupFightClubInfoByName can tell whether it was looked up from the cache or from a real database query. If it was a real database query, then I need to wait on the commit to make sure nothing fails and the data is valid to use; if no query actually happened, I can commit in the background, because the query didn't actually do anything.

You did something. Now tell your friends

The last piece of core functionality is alerting other people watching the fight club when someone makes a change, so they can update their screens. For this, I use the excellent Socket.io module. It sets up a websocket to the server if possible, but also can fall back to a polling channel for low-latency communication between the client and the server.

In v2, every change to the fightclub was done through a HTTP request, but now they are all socket.io handlers. Here's a quick example:

socket.on('addRound', function (fighterA : string, fighterB : string, outcome : EOutcome, callback : (errorInfo : FCError, fc? : FightClubInfo) => void) : void
{
    makeChangeInFightClub('addRound', sqlConnection, socket, function(sqlTransaction : sql.Transaction) : Promise<FightClubInfo>
    {
        return doAddRound(sqlTransaction, socket.fcid, fighterA, fighterB, outcome);
    }, callback);
});

The callback to the client who made the change includes the full, updated contents of the fight club so they can update their UI right away. makeChangeInFightClub also uses the "room" concept in socket.io to notify all the other sockets joined to a room with the new fight club contents, too.

Tracking socket state

Because v2 was stateless (the Ruby script exited at the end of every request), every request needed to carry with it a bunch of information, such as the which fight club you're targeting and the password you're using to gain edit privileges. With the move to socket.io, I can keep track of this kind of state on the socket object itself.

With plain Javascript you might take the socket object and just start adding fields to it. It's formalized for safety in Typescript, so I had to make a new interface:

interface DksSocket extends SocketIO.Socket
{
    fcid? : number;
    fcName? : string;
    passwordVersionNum? : number;
    remoteAddress? : string;
    clientAuditingInfo? : ClientAuditingInfo;
}

Interfaces that extend like this are weird. I had to put the ? on every field because when I "upgrade" a plain Socket to a DksSocket those fields aren't guaranteed to be there on the base object--in fact, I hope they aren't.

io.on('connection', function (ioSocket : SocketIO.Socket) : void
{
    let socket : DksSocket = ioSocket;
    // ...

Now that it's upgraded, I can set those new attributes. fcid is used internally for certain queries. fcName is the socket.io "room" that it needs to notify of changes. They both always stay in sync. Implicit in this interface is that a socket can only be joined to one fight club at a time. Notice I said "socket" and not "computer". You can open multiple browser tabs and have each one joined to a different fight club, because each one is an independent client connection.

What's the password?

The whole deal with passwordVersionNum is a bit complicated, and I might as well get into it now. It serves two purposes: if it's non-null it means that the socket is authenticated, and it also lets me detect if someone has changed the password on the fight club since this socket was authenticated (and therefore I should reject its requests).

See, I noticed I had a race condition towards the end of development: what if someone is trying to enter a fight club as an editor at the same time as someone else is trying to change that same fight club's password? The sequence of calls might go like this. Each number that increments above is because an asynchronous task started or finished. If the number doesn't increment between two lines, they are running uninterrupted on the same thread.

Client A                               Client B
1. start entering fight club
   SQL: is password same as in DB?
2. SQL: password check succeeds.
   SQL: query all FC info from DB
3.                                     SQL: change password
4. SQL: FC info returned
   mark socket as authenticated

Uh oh, looks like client A managed to slip by! Just because Javascript is single threaded, doesn't mean it can't have race conditions.

The modified protocol for password checks protects against this:

Client A                               Client B
1. start entering fight club
   SQL: is password same as in DB?
2. SQL: password check succeeds, and
   here is the current version number.
   SQL: query all FC info from DB
3.                                     SQL: change password, and also increment
                                       the version number
4. SQL: here is the FC info, including
   password version number as of #3.
   if it's different than #2, denied!

Once again it's good I wrote this blog post, because the modified protocol you see above is the result of me finding yet another hole and patching it.

With the socket marked with the password version number, I can just check if it's non-null to know if it's authorized, whenever it tries to make a change. When someone changes the password of the fight club, I can easily de-authorize all the other sockets joined to that fight club by setting the password version number to null.

Rate limiting, v3

I needed to add in rate limiting to v3. The old v2 scheme that used file locking was flawed, but with a server always running I can now do this easily. I did the same kind of bookkeeping--each socket.io connection has a pool of tickets, withdraws from it as it talks to the database, and recharges at a certain rate.

When a new socket connects to the server, I start tracking it in a global map from client IP address to a list of sockets. I also create a new ClientAuditingInfo object to store the rate limiting info, or share an already-existing one, if it's not the first socket from the client IP.

Since all the sockets for a given client IP address share the auditing info, they all get rate-limited together, so you can't get around the limits by opening a second browser tab.

Getting the client IP in Azure

The rate limiting code heavily relies on the client IP address. I did most of my testing locally, hitting localhost, but I found when I deployed to Azure, I was getting undefined from my code to look up the client IP. All I was looking at was socket.conn.remoteAddress, and I thought this would be pretty straightforward. After all, the server is getting a connection from somewhere, right?

Well, yes and no. On a regular PC, it certainly is, but in Azure, NodeJS runs connected to Microsoft's web server, IIS using something called iisnode and gets connections through named pipes, not directly through sockets. So socket.io gets confused and doesn't know the client's IP directly.

Or does it? I tried a number of other properties: socket.client.conn.remoteAddress, socket.request.connection.remoteAddress, and socket.handshake.address all didn't work. On the verge of giving up, I finally found a useful page about not being able to get the client IP. Although the issue isn't fixed, at least there's a workaround: use the X-Forwarded-For HTTP header. I enable it with a directive in my web.config (<iisnode enableXFF="true" />) and look for it with socket.handshake.headers['x-forwarded-for'].

I then made the code strict about getting the client IP, because without it I can't rate-limit properly. If I can't determine the client IP for some reason, I just disconnect the socket outright. Too bad!

Let's use HTTPS!

The last thing I really wanted to have with this project was HTTPS support, because you should secure all the things. From many places, an SSL certificate costs like 50 bucks a year, which is way more than I am willing to pay for this. I'd be okay with paying... zero dollars a year. Is there anyone who will sell me a cert for zero dollars? Anyone?

Wait, there actually is one! It's called Let's Encrypt, and it's super cool. You install a daemon on your web server that automates renewing the certificate every 3 months. There's a super janky workflow to set up that daemon on Azure. Luckily, many others have led the way, and their excellent blogs were invaluable for me to get it working.

I got hung up in two places. Firstly, the Let's Encrypt Azure Site Extension uses something called an Azure Web Job with Azure Storage, and that thing needs some connection strings set up. In fact, it needs two "custom" connection strings with the same value, named AzureWebJobsStorage and AzureWebJobsDashboard. I think at first I only had one connection string, and the site extension failed with the most confusing errors possible.

Secondly, I had to make a small change to the code that sets up Express as a web server. The renewal process for the certificate works by a challenge/response between the daemon on the web site and the Let's Encrypt certificate authority. The daemon initiates the request, and the CA asks it to prove that it has authority over the web site. The way the daemon does this is by making a random URL on the web site available, generally by writing to a file. By default, express doesn't make such paths on the server available, so I had to open it up.

// Open the .well-known directory for Let's Encrypt to use for renewal
// handshakes.
app.use('/.well-known', express.static('.well-known'));

With HTTPS nicely set up, why would I want people to use HTTP at all? I disabled it with an IIS URL rewrite rule:

<!-- Redirect all traffic to SSL -->
<rule name="ForceHTTPS" enabled="true">
  <match url="(.*)" ignoreCase="false" />
  <conditions>
    <add input="{HTTPS}" pattern="off" />
  </conditions>
  <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" appendQueryString="true" redirectType="Permanent" />
</rule>

Finally, I used this opportunity to buy knutaf.com, so I could use the snazzy site dksfight.knutaf.com for the SSL cert. I have a few other little things I learned along the way that I'll write about in later posts, but this one has gotten criminally long as it is, so I will end it now. Give it a try if you like!

0 comments:

Post a Comment