Wednesday, December 29, 2010

Debugging MapReduce in MongoDB

On a project that I am working on, we are doing some pretty intense MapReduce work inside of MongoDB. One of the things we've run up against is the lack of solid debugging tools. Some Googling basically tells you that print() is all you've got.

We've decided to take a different approach and debug our MapReduce code in the browser. Since the code is JavaScript and modern browsers have really excellent support for debugging (breakpoints, variable inspection, etc.) it's pretty easy to do.

All you need is a web app (or even static HTML file) that will:

  1. Load up one of your documents that you would like to map in the browser. Since the documents are JSON, this is easy. In our project, we have JSON fixture files and a small web app that allows you to choose which fixture to use for testing.
  2. Mock the emit() method. You can just have it write to a Hash that you can inspect later.
  3. Load up the Map and Reduce functions. If you keep these in separate .js files, you can pull them in with a simple script tag.
  4. Bind the map function to the document so that it has the correct context. In a MongoDB mapper function "this" is set to the document that you are mapping. You can easily do this with the bind() function in Underscore.js. I'm sure that other JavaScript frameworks provide a similar function.
  5. Put a link on the page that will let you run the bound function.
This will emulate the MongoDB MapReduce environment, but you can now use the browser's debugging tools.

Tuesday, December 14, 2010

Using Underscore.js with MongoDB

I've been using MongoDB for a while now and have been really happy with it. I wanted to share something we are doing on of the projects I work on that makes working with Mongo even better.

MongoDB allows for the use of JavaScript to do lots of work on the server side. This includes running MapReduce jobs on collections, but also can be used in where clauses and for doing grouping. Being able to use JavaScript for these things is handy, but using just the core JavaScript language can be less than ideal. That's why we prime our MongoDB environment with Underscore.js.

On the Underscore website, it claims to be a JavaScript utility belt. I've found that to be the case. It has functions like any or include that save you the trouble of having to write for loops to iterate over arrays. While the MongoDB documentation describes how you can store individual functions for server side use, it didn't really touch on how you could load an entire library like Underscore.

It turns out you can load up libraries like this pretty easily using db.eval(). I recall reading (but can't currently find the docs to prove this) that every MongoDB connection has a JavaScript context associated with it. If you create functions in this context, they will exist as long as the connection is around. So if you just eval the Underscore.js library before you do any work with your connection, you will have access to all of its functions to do your work.

Here is an example of how to use Underscore.js with the Ruby driver. In this example, I'll set up the MongoDB connection with Underscore.js, create a sample dataset of cars, then use Underscore to group them by make without repeating model.

The only downside to this approach is that db.eval does not seem to work with sharding. That is OK for me right now, but YMMV. Also note that I am using the awesome_print gem to pretty print the results.