Monday, May 17, 2010

Gwibber on Ubuntu 10.04 issues with FiOS

I just upgraded one of my machines to the latest and greatest Ubuntu because I plan on taking it on the road later this week. After I got everything set up, I fired up Gwibber, my favorite Twitter client on Linux. Immediately, I started running into problems. I couldn't get Gwibber to load any new tweets. There seem to be several people who are experiencing this issue with Gwibber, but their troubles are related to the language settings. That was not the case for me.

I did some digging by firing up Gwibber in a terminal:


$> gwibber-service -o -d



This is what I got:





Gwibber wasn't refreshing because it was timing out on the DNS lookup. I have Verizon FiOS as an ISP. FiOS having terrible DNS seems to be a problem. I switched over to Google DNS and every thing is snappy and working properly. If you're using Gwibber on FiOS and having issues, try this out. It may save you an hour or three.

Gwibber, or whatever library it is using for network communication, picked a pretty short timeout, but this is pretty lame. Verizon really needs to step it up here. People will perceive FiOS as slow because it takes forever to look up an IP, even though the network is pretty quick in my experience.

Thursday, April 1, 2010

Using xargs with git

Sometimes, when I'm working on a project, I'll create a bunch of new files and realize that I have a ton of untracked stuff that I need to add to my git repository. Since generally only use git on the command line, it would be painful to copy and paste all of the untracked file names from the output of git status into separate git add commands.

The two commands I have found handy for dealing with this situation are git ls-files and xargs.

If you run the command:
git ls-files -o
It will show you all of the untracked files in your working directory, one file per line. A problem that you will run into here is that it also shows files in your .gitignore. To get around this issue, you just need another argument to specify your .gitignore:
git ls-files -o --exclude-per-directory=.gitignore
Now that you have all of the files you want to add, you just need to run git add on all of them. This is where xargs comes in handy. It will read from standard in, break it up on line endings and then feed each line as an argument into another command. Putting it all together, you get:
git ls-files -o --exclude-per-directory=.gitignore | xargs git add
That last command will add any untracked files to your git repository. The beautiful thing here is that we can also leverage some UNIX-y goodness if we want to as well. Let's say we're working on a project and we only want to commit some XSLT we have been working on. You can do this by throwing grep into the command chain:
git ls-files -o --exclude-per-directory=.gitignore | grep xslt | xargs git add
This will only add files that contain "xslt" in their names. This same approach comes in handy when you remove files from your working copy but forget to run git rm.

Monday, November 23, 2009

Classy hData

I've been working on a team that is looking at ways in which we can simplify the exchange of information in Health IT. This effort is called hData. We just released a new version of our packaging and network transport spec, and I would like to talk a bit about how we arrived at this version.

I think it is really important for IT specifications to have a reference implementation available. If you build a spec without code, it's really hard to see where you have gone wrong. To make sure we are on the right track, I built a small web application that implements the spec. I was able to quickly uncover some bugs in our work. Bugs I'm sure we would have missed by just reading the document.

Technology Choices

Since my preferred language of choice is Ruby, it would be natural to think I would want to tackle this project in Rails. However, in hData we make some good use of the HTTP Verbs, and I'm not so sure that they would line up seamlessly with Rails conventions. I decided to go with a much simpler choice. Sinatra is a small web framework that seems perfect for this job. It makes the HTTP Verbs central to your code, so it should be fairly obvious on how we go from the spec to implementation.

There are a few other tools that I used on this adventure. DataMapper was just right for the ORM needs of the project. I could have used ActiveRecord to persist data, but DataMapper has a really nice auto-migration feature, which will save me from writing all of the database creation code. I also used Bundler to manage my application's dependencies.

Getting Started

The best way to get started here is by taking a test driven approach to the spec. For that I will be using Shoulda and Rack Test. With my TDD tools in place, I can take part of the spec that looks like this:


3.1.2 POST

3.1.2.1 Parameters: type, typeId, requirement

For this operation, the value of type MUST equal "extension". The typeId MUST be a URI string that represents a type of section document. The requirement parameter MUST be either "optional" or "mandatory". If any parameters are incorrect or not existent, the server MUST return a status code of 400.

If the system supports the extension identified by the typeId URI string, this operation will modify the extensions node in the root document and add this extension with the requirement level identified by the requirement parameter. The server MUST return a 201 status code.

If the system does not support the extension, it MUST not accept the extension if the requirement parameter is "mandatory" and return a status code of 409. If the requirement is "optional" the server MAY accept the operation, update the root document and send a status code of 201.

Status Code: 201, 400, 409

and turn it into the a Shoulda context block. In the spec above, we're talking about what should happen when you POST to the root of an hData Record. The functionality being described here is how an extension can be added to the record, or how you can register a different type of thing for a record. For example, you could use this feature to add an medications extension to a record, if one did not exist there already. In our test code, we're going to try to register an allergies extension:

As you can see from the code, the combination of Shoulda and Rack Test make it really easy to express the requirements set forth in the specification. The first test tries to POST and incomplete request and should receive an error. The second sends a properly formed request and should get an appropriate response. The last test tries to POST a duplicate extension.

With the tests in place, we can move on to implementation.

I have created a DataMapper Resource to capture all of the information we want to store about an extension. I will also use the validation framework of DataMapper to make sure that all of the requirements for an extension are met. I end up with the resulting code:


With my model in place, I can implement the code to handle the web request:


The code above is pretty typical for Sinatra. The post block handles POST's to the root URL. There I call a method to check and make sure that the type parameter is set. If it isn't I halt the processing and let the user know that the request is malformed with a 400 code. If the type is set to extension, then we drop into the handle_extension method. Inside of the method, I build an Extension object and check it using the DataMapper validation framework.

There is a little bit of funkiness at the end of the handle_extension method where I need to check the type of error. This is due to the fact that I need to return different status codes depending on the error. Unfortunately, with the DataMapper validations, I didn't see any way to return anything with the errors other than a text message, so this seemed like the best way of doing things.

The handle_section at the end of the post block handles another part of the spec. Don't worry, I didn't write it until I had the tests done first.

Lather, Rinse, Repeat

Implementing the rest of the hData Packaging and Transport spec followed the same process. Take the spec and write a matching unit test. Implement the spec and refine the code until the test passed.

In doing this, I found a couple of bugs in our spec. We hadn't provided parameter names for POSTing section documents. Our description of how to add metadata to documents was ambiguous at best. The nice part was that I was able to discover these things before even digging into the implementation.

What still needs to be done

While the Sinatra app that I wrote is a pretty good implementation of the hData Packaging and Transport spec, it still has some gaps. It doesn't support POSTing metadata on documents, it only creates and serves it's own. It also doesn't support nested sections, but that shouldn't be too hard to add.

Wrap Up

You can find the code at eedrummer/classy-hdata on github. Even if you aren't interested in hData, this application should serve as an example Sinatra/DataMapper application. If you dig into the code and the hData spec, I think you'll see that hData is really easy to implement, especially in a classy framework like Sinatra.

Saturday, October 17, 2009

Teaching at the Ruby on Rails Workshop for Women

Today I taught the beginner class at the Ruby on Rails Workshop for Women. First of all, I want to thank Mary Tolbert for getting me involved. Also, a big thanks to Sarah Allen and Liana Leahy putting things together and giving me a chance to say that "I taught a class at Harvard."

I was a bit intimidated in teaching the class at first. I was upgraded from TA status to teacher mid-week and was a bit worried that I wouldn't know what to say. Maybe I was right to be worried. Also, many of the TA's were rockstar rubyists and I have to admit that I felt a little silly speaking authoritatively in front of them.

Once I started getting into the material I started to feel pretty comfortable. Watching people have a-ha moments is really rewarding. Trying to explain programming with out falling back on computer science terms is kinda tricky on the fly. Maybe if I do something like this again, I can speak on things a little more smoothly.

It's pretty incredible how much the students were able to pick up in a day. I know that we had some people in the class that had never written code in their life and it was really cool to see them hacking in just a few hours.

If I were to do it over again, I would suggest just a plain ruby class for beginners. Or maybe a two day class where Rails is covered in the second day. When I thought that the students were getting comfortable with Ruby, we only had one session left. I think that everyone was with me when we were modifying the default index page in Rails, but I wanted to make sure that the students saw the ruby bits too. So I rushed through controllers and views in 15 minutes and I would be surprised if anyone got anything out of it. It probably would have been best to punt on the controllers and just explain a little about HTML and CSS.

I noticed that for a lot of beginners, application switching was killer. Switching between the terminal, browser and text editor is second nature for me, but not so much for those just getting started. I wish I had a ton of screen real estate to keep all three visible at once, but I don't know if that would actually help.

We didn't get to really cover git or Heroku in the class. I was able to tell the students enough to use them, but not understand them. The localhost vs. Heroku server was definitely a stumbling block for some. For beginners, I might punt on Heroku and just work from localhost without any source control just to get started.

I got a ton of positive feedback on Twitter and in person. This was super encouraging. I've always been considering teaching at some level later in my career, so it seems like I am on the right track there.

Overall, it was an awesome experience. One thing I was surprised to hear is how intimidating it is for women to participate in local open source meet ups. I suppose that being a 6' 2" person who has played contact sports leaves me in a position where I am generally not to scared of software developers. But getting up in front of the class today with all of the TA's looking at me, I think I'm starting to get it. Hopefully, events like these can start to even out the gender balance so that more women feel comfortable participating in the Ruby and FOSS community.

Tuesday, August 18, 2009

Building Tokyo Cabinet for use with Java on OS X

I've been really interested in playing with Tokyo Cabinet lately. I thought that it would be fun to take a hack at the GitHub Contest using Scala and Tokyo Cabinet. I then set out to build Tokyo Cabinet and its Java bindings (since I can call those easily from Scala). The Java bindings for Tokyo Cabinet are not pure Java, they use JNI, so you need to compile some C as well as Java. Everything looked fine and dandy until I tried to run some code. I then ran into this stack trace from Scala:



To translate, what is going on here is that by default Tokyo Cabinet will build a 32 bit binaries. Java 1.6 on OS X is 64 bit and will look for a 64 bit version of the library. Here is what I did to make things happy.

When running the configure script for Tokyo Cabinet itself, I added a flag:



I tried the same trick when configuring the Java bindings, but it didn't seem to end up in the resulting Makefile. So I edited the Makefile by hand. In the end, my CFLAGS line looks like this:



After that, I was able to get a small Scala script to create a Hash database.

As an aside, the Scala IDE for Eclipse seems really nice. I had tried it out a few months ago, and it has clearly made a lot of progress since then.

Saturday, November 22, 2008

Glassfish is looking speedy

Let me start by saying that most benchmarks should be taken with a grain of salt, so please do the same here.

I work on Project Laika, and for our deployments we are looking to switch to JRuby. We already have java code hanging around, and it looks like we will need some more soon (I'm thinking it will be easier to deal with SOAP web services in java and calling the classes from JRuby). I wanted to run some numbers to make sure that our performance wouldn't fall through the floor.

The Setup

I decided to pit Mongrel 1.1.5 running on MRI 1.8.6 against Glassfish v3 Prelude 1 and JRuby 1.1.5 on Java 5. I'm running Rails 2.0.2 in both setups (I know we are behind the times). I ran them both on OS X 10.5.5 and had the Rails apps hit the same MySQL database.

I used ab to grab the numbers. I had it hit the site 1000 times with 10 concurrent requests. I hacked the Laika app slightly so that you didn't have to log in.

The Test

I wanted to get a feel for how the app would perform, so I did two simple tests: dynamic content from a typical page and static content. The dynamic content was the patient template library in Laika which contains code that you'd expect in a Rails app: AR pulling info from the DB and putting it into ERB templates. I also pulled down the Rails 404 page to get a feel for serving static content. This is probably less meaningful, as you'll probably have Apache or Nginx serve up your static stuff.

The Results

Glassfish, hands down. The results are the average number of milliseconds it took to serve each request.


It beat Mongrel in both static and dynamic content easily. Glassfish v3 also makes it ridiculously easy to deploy Rails apps. You can use the Glassfish gem, and serve up your app with a single command. I installed the Glassfish server, so I could run JEE apps alongside my Rails stuff, there is a single command where you point the app server to the root directory of your app and you're done.

With Rails 2.2 now out and offering thread safety, and JRuby being the only interperter that can take advantage of it... Glassfish and JRuby are really worth checking out.

Friday, September 5, 2008

SezHoo and WikiSym 2008

I'm proud to announce the release of SezHoo! It's a tool we created at MITRE to help establish reputations for authors in wiki sites... specifically those run on MediaWiki (the same software that runs Wikipedia).

The way that SezHoo works is that it goes through a wiki article's history and tracks the authorship of each word in the article. We use a pretty popular technique in text analysis called shingling to help us out.

For each revision of an article, we break it up into shingles. We're using shingles that are 6 words long, so each word will have 6 shingles associated with it. We credit authorship of the word to the author of the earliest shingle.

This allows us to add some information to a MediaWiki installation:

Pages
We show the authors of a page, and the percentages at which they have contributed. We also offer this information at the section level.

The information we compute is better than doing diffs with a standard MediaWiki install. Diffs are line ending based, so changing a word in a paragraph that has a single line ending can make it difficult to determine who wrote the paragraph. In addition, since we are looking for authorship of a word across all revisions at once (as opposed to a revision vs. revision comparison with a diff), this technique is immune to copying and pasting of text and reversion of vandalism.

Authors
We create a reputation for authors with 3 dimensions: quantity, quality and value. For all metrics, we assign a 5 star rating. The way this is done is by computing a raw score for all authors and then ranking them. Authors in the top 20% have a 5-star rating in that dimension.

For quantity, we simply calculate the number of word contributed by each author. This includes words that may no longer be in the current revision of the article. Value takes the percentage authorship of a page, and multiplies that by the number of page views (it's valuable if it's being read).

Quality is a lot more tricky. We assume that wiki articles are a Darwinian environment. That is, quality text will "survive" in that as the article is edited, other authors will leave the text alone. Poor quality text will be "killed" or deleted by other authors. We determine quality by looking at the percentage of text that is expected to be alive for an author after 8 revisions (you can change 8 to whatever you like, but it works well for our internal wiki). To calculate the percentage alive after 8 revisions, we used a Kaplan-Meier estimate. This was necessary due to the fact that authors will have phrases which are "alive" (they are in the current version of an article). So if a word has survived 3 edits and is in the current version of an article, when calculating survival probabilities you don't want to say that it "died" at 3 revisions, but you don't want to lose the data either. Kaplan-Meier handles that for us.

In the end, you get a 5-star rating for each author in the dimensions of quantity, quality and value.



I'll be presenting this work at WikiSym 2008. We're doing some other interesting stuff which we haven't rolled into SezHoo yet, so it's worth coming to my talk even if you've read this blog post.