SimpleDB w/ AWS JDK

programming

Problem to solve

A flow on the App server triggers an event that needs to be handled by several listeners that then need to each run for a couple of seconds (or more).  Obviously the Web request has to return relatively fast, so doing all that synchronously is out of the question.

One solution is to use a thread pool with N worker threads to do these queued up tasks.  But that approach also takes processing power away from the App server.  Works some times and not others, depending on the load and cost of these tasks.

Another solution, if timing is not that important, is to push up a request into a persistent message queue and have worker processes pull requests off of that queue and process each.  Nice separation of concerns.  Decoupling. Etc.  All the nice things that message queue vendors say.

Each message has to be processed in sequence relative to others.  In other words, FIFO.

Why SimpleDB?

There are full-blown message queue solutions (ZeroMQ & ActiveMQ come to mind). But then I’d have to get the “DevOps” people to stand one up and manage it which is kinda like pulling teeth.

Then there’s the good ‘ol stand-by: Postgresql.  Others have used a dedicated table for queuing purposes, with a “status” column and values like “Pending,” “Processing,” “Success,” and “Fail.”  It’s persistent, and the synthetic row ID can be used for sequencing to get the FIFO iteration. But I have hated the idea of using a table to implement a queue.  Mostly because all the hassle of checking in a migration script to create the damn table and a lot of boiler-plate process to go through.  And the schema is inflexible if I ever need to tweak the message structure.  So I either need to worry about schema changes or break the 1st normal form.

Then there are SaaS offerings.  We use EC2 and various other services from the AWS stack, so it makes sense to see what can work.

SQS — not FIFO

Initially I had gone with SQS for a work queue. It worked well for some other things I’ve done in the past.  BUT in this case I actually need strict ordering which SQS is not providing. I tried to get around that using a timestamp in the message, but unless I read all the packets (or a sufficient number of them), I can’t tell if the first message I get is the right one.  Then I have to either cache the message locally to sort them and then work through them locally and/or re-queue the rest.  That makes them further back in the queue when they should probably be at the head, and the sequencing just got worse.  Yuck.

Comes SimpleDB

So then comes SimpleDB. It’s persistent, almost like a RDBMS without the strict schema and DDL BS (those are good things for domain data, to be sure–just not for my message queue).

So each “row” will have the attribute “timestamp” which currently is just that Unix time (System.currentTimeInMillis())–a convenient long that is easy to sort by to get the FIFO behavior. And the rest of the attributes are basically whatever I need for my message.

AWS Console: no SimpleDB here

I had to check my glasses because I cannot find a page/console for SimpleDB. Almost all of the services in the AWS suite each at least has a page for management of instances of the service.  DynamoDB has one, for instance.  (Incidentally, I didn’t go with DynamoDB since it just felt like overkill at the time.)

Anyway.  No management page.  So it’s a bit frustrating to get thing up to test and figuring out what went wrong when things don’t work.  There IS a tool: SimpleDB Scratchpad  (https://aws.amazon.com/code/JavaScript/1137), but that needs to be downloaded and “installed.” The installation process is more than just un-packaging the files:

  1. Change the various endpoints to point to the correct one for the region I want (i.e. change “sdb.amazonaws.com” to “sdb.us-west-2.amazonaws.com” for the US-WEST-2 region).
  2. Pre-fill the key and secret in the navbar since it’s tedious to type that in all the time.

Once that’s all done, simply bringing up the webapp/index.html as a file in a browser pretty much works–except not for Chrome because it thinks it’s more clever than you about security of Javascript.  Fine.  Firefox, it is.

Weird SELECT syntax

The SELECT query syntax is deceptively similar to SQL but not really it.  Obviously joins are out of the question, but even simple queries have nuances:

  • If your table (“domain”) has anything other than alphanumeric and _ OR if it starts with a digit, you need to quote it with back-tick (`).  E.g. SELECT * FROM `my-domain` …
  • To sort on (aka ORDER BY) something, that thing has to be in the WHERE clause (huh ?!) E.g. SELECT … WHERE timestamp > ‘0’ ORDER BY timestamp
  • Why the quote on 0? All literals seem to be of type text, even things like the timestamp.