tag:blogger.com,1999:blog-59932871186487556212024-03-10T15:13:04.851-04:00Andy's BlogGeneric blog description about how random the contents will be.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-5993287118648755621.post-55024505427759443792012-06-09T23:25:00.000-04:002012-06-09T23:25:12.363-04:00Rooting your Droid IncredibleI have an old Droid Incredible that I'm hanging onto until the iPhone 5 comes out. In the meantime, I've been fighting with the "<a href="https://www.google.com/search?q=phone+storage+space+is+getting+low">Phone storage space is getting low</a>" issue. Like many people seeing this problem, I have plenty of free space available. I was able to fix it by rooting my phone and running a custom ROM to repartition the phone's storage. Here is how I did it.<br />
<h2>
Disclaimer</h2>
<div>
Rooting your phone may void your warranty. Back up your data before doing this. Also know that I haven't been terribly rigorous in understanding everything that is happening here. You could say that I am <a href="http://en.wikipedia.org/wiki/Cargo_cult">cargo cult</a>-ing some of the instructions. For me, since I am due for a phone upgrade, the worst case scenario is that if I brick my phone, I'll head to the Apple store and pick up a 4S.</div>
<h2>
Overview</h2>
<div>
What this will do is give you root access to the phone. This will allow you to run custom ROMs. Once you have this, you can replace everything on the phone with something like <a href="http://www.cyanogenmod.com/">CyanogenMod</a>. For the time being, I have chosen to stick with the OS shipped on the phone but booted into a custom ROM to fix the issues I am seeing.</div>
<div>
<br /></div>
<div>
The "Phone storage space is getting low" message is caused because application files and some data is stored on a partition that is 150MB. It's mounted to /data/data. Once your apps or that data hist 140MB, you'll get the error and your phone will shut down sync services, meaning you won't see new emails. I found a ROM that repartitions this space to 750MB and haven't had a problem since (roughly a month).</div>
<h3>
Gaining Root</h3>
<div>
I use a Mac at home and used <a href="http://unrevoked.com/recovery/">Unrevoked 3</a> to gain root access. The latest version of the software (3.32) did not work in attempting to gain root. I got an error message stating that the firmware was too new. I was able to pull down <a href="http://downloads.unrevoked.com/recovery/3.22/Reflash.dmg">version 3.22</a> by changing the download URL. That worked fine. You'll need root access to boot the phone into custom ROMs. You'll need custom ROMs to do the repartitioning.</div>
<h3>
Repartitioning</h3>
<div>
I bought a copy of <a href="https://play.google.com/store/apps/details?id=com.koushikdutta.rommanager&hl=en">ROM Manager</a> from the Play Store. This will let you boot custom ROMs. I used it to boot the <a href="http://dinc.does-it.net/EXT4_Mods/Convert2Ext4_no_data_limit_normal_dalvik.zip">Convert2Ext4_no_data_limit_normal_dalvik</a> ROM. You can find a description of what the ROM does and other variants of it in <a href="http://forum.xda-developers.com/showthread.php?t=1315372">this XDA Developers forum post</a>. You will need to move the ROM image to your phone to boot it. I just downloaded it to my Mac, mounted the phone's SD card as a disk and copied it over.<br />
<h3>
Success</h3>
</div>
<div>
After you run the custom ROM, your phone should now be repartitioned and the low storage space error message should go away. With root, you should be able to do other nice things like remove Verizon bloatware. I haven't tried that yet.</div>
<h3>
Text Messaging</h3>
<div>
After rooting and repartitioning the phone, I noticed that I could send text messages, but I couldn't receive them. This seems to be a relatively common problem among people who do this sort of thing. After some Googling, I came across <a href="https://community.verizonwireless.com/message/689587">this post</a> in the Verizon community forums. Basically, if you go to <a href="http://dl3.htc.com/misc/inc8049.apk">http://dl3.htc.com/misc/inc8049.apk</a> on your phone, it fixes the issue and you can get text messages again. I have no idea what the file does, so download at your own risk. It goes to HTC, so I figured it was relatively safe.</div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com51tag:blogger.com,1999:blog-5993287118648755621.post-68198171130857619672010-12-29T20:43:00.004-05:002010-12-29T21:00:50.589-05:00Debugging MapReduce in MongoDBOn a project that I am working on, we are doing some pretty intense MapReduce work inside of <a href="http://www.mongodb.org/">MongoDB</a>. One of the things we've run up against is the lack of solid debugging tools. Some Googling basically tells you that <a href="http://groups.google.com/group/mongodb-user/browse_thread/thread/6327a330e69e86ef?tvc=2">print() is all you've got</a>.<div><br /></div><div>We've decided to take a different approach and debug our MapReduce code in the browser. Since the code is JavaScript and modern browsers have really excellent support for debugging (breakpoints, variable inspection, etc.) it's pretty easy to do.</div><div><br /></div><div>All you need is a web app (or even static HTML file) that will:</div><div><br /></div><div><ol><li>Load up one of your documents that you would like to map in the browser. Since the documents are JSON, this is easy. In our project, we have JSON fixture files and a small web app that allows you to choose which fixture to use for testing.</li><li>Mock the emit() method. You can just have it write to a Hash that you can inspect later.</li><li>Load up the Map and Reduce functions. If you keep these in separate .js files, you can pull them in with a simple script tag.</li><li>Bind the map function to the document so that it has the correct context. In a MongoDB mapper function "this" is set to the document that you are mapping. You can easily do this with the <a href="http://documentcloud.github.com/underscore/#bind">bind()</a> function in Underscore.js. I'm sure that other JavaScript frameworks provide a similar function.</li><li>Put a link on the page that will let you run the bound function.</li></ol><div>This will emulate the MongoDB MapReduce environment, but you can now use the browser's debugging tools. </div></div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com61tag:blogger.com,1999:blog-5993287118648755621.post-63657175250147745132010-12-14T20:44:00.004-05:002010-12-14T22:15:56.410-05:00Using Underscore.js with MongoDBI've been using <a href="http://www.mongodb.org/">MongoDB</a> for a while now and have been really happy with it. I wanted to share something we are doing on of the projects I work on that makes working with Mongo even better.<div><br /></div><div>MongoDB allows for the use of JavaScript to do lots of work on the server side. This includes running MapReduce jobs on collections, but also can be used in <a href="http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-JavascriptExpressionsand%7B%7B%24where%7D%7D">where clauses</a> and for doing <a href="http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group">grouping</a>. Being able to use JavaScript for these things is handy, but using just the core JavaScript language can be less than ideal. That's why we prime our MongoDB environment with <a href="http://documentcloud.github.com/underscore/">Underscore.js</a>.</div><div><br /></div><div>On the Underscore website, it claims to be a JavaScript utility belt. I've found that to be the case. It has functions like <a href="http://documentcloud.github.com/underscore/#any">any</a> or <a href="http://documentcloud.github.com/underscore/#include">include</a> that save you the trouble of having to write for loops to iterate over arrays. While the MongoDB documentation describes how you can<a href="http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-Storingfunctionsserverside"> store individual functions for server side use</a>, it didn't really touch on how you could load an entire library like Underscore.</div><div><br /></div><div>It turns out you can load up libraries like this pretty easily using db.eval(). I recall reading (but can't currently find the docs to prove this) that every MongoDB connection has a JavaScript context associated with it. If you create functions in this context, they will exist as long as the connection is around. So if you just eval the Underscore.js library before you do any work with your connection, you will have access to all of its functions to do your work.</div><div><br /></div><div>Here is an example of how to use Underscore.js with the Ruby driver. In this example, I'll set up the MongoDB connection with Underscore.js, create a sample dataset of cars, then use Underscore to group them by make without repeating model.</div><br /><br /><script src="https://gist.github.com/741573.js?file=gistfile1.rb"></script><br /><br /><div>The only downside to this approach is that db.eval does not seem to work with sharding. That is OK for me right now, but YMMV. Also note that I am using the <a href="https://github.com/michaeldv/awesome_print">awesome_print</a> gem to pretty print the results.</div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com59tag:blogger.com,1999:blog-5993287118648755621.post-19850322888275914922010-09-15T11:11:00.002-04:002010-09-15T11:47:05.384-04:00Git rm may cause insanityRan across this today, and wanted to help others avoid the same fate. If you use git rm to remove the last file in a directory, it will remove the directory as well. If you are in that directory, odd things can happen that will potentially drive you insane. <div><br /></div><div>Let's create a git repository with a folder that has a single file in it:</div><div><pre>$ cd /tmp<br />$ mkdir foo<br />$ cd foo/<br />$ git init<br />Initialized empty Git repository in /private/tmp/foo/.git/<br />$ mkdir bar<br />$ cd bar/<br />$ echo 'hello' > splat.txt<br />$ git add splat.txt<br />$ git ci -m 'adding a text file'<br />[master (root-commit) 069f11b] adding a text file<br />1 files changed, 1 insertions(+), 0 deletions(-)<br />create mode 100644 bar/splat.txt<br /></pre></div><br /><div>So I now have a git repository and placed the file bar/splat.txt under revision control. Now if I do:</div><div><pre>$ git rm splat.txt<br />rm 'bar/splat.txt'</pre></div><div>This will not only remove splat.txt, but it will remove the whole bar directory. I say this will drive you insane, because if you try to move or copy a file into your current directory, you'll get an error that will probably catch you off guard. Like:</div><div><pre>$ cp ~/.gitignore .<br />cp: ./.gitignore: No such file or directory</pre></div><div>There is a file called .gitignore in my home directory, it's just that my current directory no longer exists. It took me about five minutes to realize what was going on... and I was starting to wonder if I knew how to use the cp command.</div><div><br /></div><div>The reason I ran into this is that I was rearranging my .vim folder to use <a href="http://www.vim.org/scripts/script.php?script_id=2332">pathogen</a>. I keep all of my dot files under source control and stumbled upon this while clearing out my vim autoload folder. </div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com25tag:blogger.com,1999:blog-5993287118648755621.post-19135004931588821022010-05-17T21:38:00.005-04:002010-05-17T22:16:02.295-04:00Gwibber on Ubuntu 10.04 issues with FiOSI just upgraded one of my machines to the latest and greatest Ubuntu because I plan on taking it on the road later this week. After I got everything set up, I fired up <a href="https://launchpad.net/gwibber">Gwibber</a>, my favorite Twitter client on Linux. Immediately, I started running into problems. I couldn't get Gwibber to load any new tweets. There seem to be several people who are experiencing this issue with Gwibber, but their troubles are related to the language settings. That was not the case for me.<div><br /></div><div>I did some digging by firing up Gwibber in a terminal:</div><div><br /><pre><br />$> gwibber-service -o -d<br /></pre><br /></div><br /><div><br />This is what I got:<br /></div><br /><div><br /><script src="http://gist.github.com/404503.js?file=gistfile1.txt"></script><br /></div><br /><div><br />Gwibber wasn't refreshing because it was timing out on the DNS lookup. I have Verizon FiOS as an ISP. FiOS having terrible DNS <a href="http://www.google.com/search?aq=f&ie=UTF-8&q=fios+dns+slow">seems to be a problem</a>. I switched over to <a href="http://code.google.com/speed/public-dns/">Google DNS</a> and every thing is snappy and working properly. If you're using Gwibber on FiOS and having issues, try this out. It may save you an hour or three.</div><div><br /></div><div>Gwibber, or whatever library it is using for network communication, picked a pretty short timeout, but this is pretty lame. Verizon really needs to step it up here. People will perceive FiOS as slow because it takes forever to look up an IP, even though the network is pretty quick in my experience.<br /></div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com31tag:blogger.com,1999:blog-5993287118648755621.post-28177865716064904872010-04-01T22:02:00.002-04:002010-04-01T22:21:04.488-04:00Using xargs with gitSometimes, when I'm working on a project, I'll create a bunch of new files and realize that I have a ton of untracked stuff that I need to add to my git repository. Since generally only use git on the command line, it would be painful to copy and paste all of the untracked file names from the output of <span style="font-weight: bold;">git status</span> into separate <span style="font-weight: bold;">git add</span> commands.<br /><br />The two commands I have found handy for dealing with this situation are <span style="font-weight: bold;">git ls-files</span> and <span style="font-weight: bold;">xargs</span>.<br /><br />If you run the command:<br /><pre>git ls-files -o</pre>It will show you all of the untracked files in your working directory, one file per line. A problem that you will run into here is that it also shows files in your <span style="font-weight: bold;">.gitignore</span>. To get around this issue, you just need another argument to specify your <span style="font-weight: bold;">.gitignore</span>:<br /><pre>git ls-files -o --exclude-per-directory=.gitignore</pre>Now that you have all of the files you want to add, you just need to run <span style="font-weight: bold;">git add</span> on all of them. This is where <span style="font-weight: bold;">xargs</span> comes in handy. It will read from standard in, break it up on line endings and then feed each line as an argument into another command. Putting it all together, you get: <pre>git ls-files -o --exclude-per-directory=.gitignore | xargs git add</pre>That last command will add any untracked files to your git repository. The beautiful thing here is that we can also leverage some UNIX-y goodness if we want to as well. Let's say we're working on a project and we only want to commit some XSLT we have been working on. You can do this by throwing <span style="font-weight: bold;">grep</span> into the command chain: <pre>git ls-files -o --exclude-per-directory=.gitignore | grep xslt | xargs git add</pre>This will only add files that contain "xslt" in their names. This same approach comes in handy when you remove files from your working copy but forget to run <span style="font-weight: bold;">git rm</span>.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com100tag:blogger.com,1999:blog-5993287118648755621.post-89534906477249049272009-11-23T11:11:00.012-05:002009-11-23T23:09:00.023-05:00Classy hData<div>I've been working on a team that is looking at ways in which we can simplify the exchange of information in Health IT. This effort is called <a href="http://www.projecthdata.org/">hData</a>. We just released a <a href="http://www.projecthdata.org/">new version of our packaging and network transport spec</a>, and I would like to talk a bit about how we arrived at this version.</div><br /><div>I think it is really important for IT specifications to have a reference implementation available. If you build a spec without code, it's really hard to see where you have gone wrong. To make sure we are on the right track, I built a small web application that implements the spec. I was able to quickly uncover some bugs in our work. Bugs I'm sure we would have missed by just reading the document.</div><br /><div><span class="Apple-style-span" style="font-size:x-large;"><b>Technology Choices</b></span></div><br /><div>Since my preferred language of choice is Ruby, it would be natural to think I would want to tackle this project in Rails. However, in hData we make some good use of the HTTP Verbs, and I'm not so sure that they would line up seamlessly with Rails conventions. I decided to go with a much simpler choice. <a href="http://www.sinatrarb.com/">Sinatra</a> is a small web framework that seems perfect for this job. It makes the HTTP Verbs central to your code, so it should be fairly obvious on how we go from the spec to implementation.</div><br /><div>There are a few other tools that I used on this adventure. <a href="http://datamapper.org/">DataMapper</a> was just right for the ORM needs of the project. I could have used ActiveRecord to persist data, but DataMapper has a really nice auto-migration feature, which will save me from writing all of the database creation code. I also used <a href="http://github.com/wycats/bundler">Bundler</a> to manage my application's dependencies.</div><br /><div><span class="Apple-style-span" style="font-size:x-large;"><b>Getting Started</b></span></div><br /><div>The best way to get started here is by taking a test driven approach to the spec. For that I will be using <a href="http://github.com/thoughtbot/shoulda">Shoulda</a> and <a href="http://github.com/brynary/rack-test">Rack Test</a>. With my TDD tools in place, I can take part of the spec that looks like this:</div><br /><blockquote><br />3.1.2 POST<br /><br />3.1.2.1 Parameters: type, typeId, requirement<br /><br />For this operation, the value of type MUST equal "extension". The typeId MUST be a URI string that represents a type of section document. The requirement parameter MUST be either "optional" or "mandatory". If any parameters are incorrect or not existent, the server MUST return a status code of 400.<br /><br />If the system supports the extension identified by the typeId URI string, this operation will modify the extensions node in the root document and add this extension with the requirement level identified by the requirement parameter. The server MUST return a 201 status code.<br /><br />If the system does not support the extension, it MUST not accept the extension if the requirement parameter is "mandatory" and return a status code of 409. If the requirement is "optional" the server MAY accept the operation, update the root document and send a status code of 201.<br /><br />Status Code: 201, 400, 409<br /></blockquote><br />and turn it into the a Shoulda context block. In the spec above, we're talking about what should happen when you POST to the root of an hData Record. The functionality being described here is how an extension can be added to the record, or how you can register a different type of thing for a record. For example, you could use this feature to add an medications extension to a record, if one did not exist there already. In our test code, we're going to try to register an allergies extension:<br /><script src="http://gist.github.com/241617.js"></script><br /><div>As you can see from the code, the combination of Shoulda and Rack Test make it really easy to express the requirements set forth in the specification. The first test tries to POST and incomplete request and should receive an error. The second sends a properly formed request and should get an appropriate response. The last test tries to POST a duplicate extension.</div><br /><div>With the tests in place, we can move on to implementation.</div><br /><div>I have created a DataMapper Resource to capture all of the information we want to store about an extension. I will also use the validation framework of DataMapper to make sure that all of the requirements for an extension are met. I end up with the resulting code:</div><br /><script src="http://gist.github.com/241619.js"></script><br /><div>With my model in place, I can implement the code to handle the web request:</div><br /><script src="http://gist.github.com/241623.js"></script><br /><div>The code above is pretty typical for Sinatra. The post block handles POST's to the root URL. There I call a method to check and make sure that the type parameter is set. If it isn't I halt the processing and let the user know that the request is malformed with a 400 code. If the type is set to extension, then we drop into the handle_extension method. Inside of the method, I build an Extension object and check it using the DataMapper validation framework.</div><br /><div>There is a little bit of funkiness at the end of the handle_extension method where I need to check the type of error. This is due to the fact that I need to return different status codes depending on the error. Unfortunately, with the DataMapper validations, I didn't see any way to return anything with the errors other than a text message, so this seemed like the best way of doing things.</div><br /><div>The handle_section at the end of the post block handles another part of the spec. Don't worry, I didn't write it until I had the tests done first.</div><br /><div><span class="Apple-style-span" style="font-size:x-large;"><b>Lather, Rinse, Repeat</b></span></div><br /><div>Implementing the rest of the hData Packaging and Transport spec followed the same process. Take the spec and write a matching unit test. Implement the spec and refine the code until the test passed.</div><br /><div>In doing this, I found a couple of bugs in our spec. We hadn't provided parameter names for POSTing section documents. Our description of how to add metadata to documents was ambiguous at best. The nice part was that I was able to discover these things before even digging into the implementation.</div><br /><div><span class="Apple-style-span" style="font-size:x-large;"><b>What still needs to be done</b></span></div><br /><div>While the Sinatra app that I wrote is a pretty good implementation of the hData Packaging and Transport spec, it still has some gaps. It doesn't support POSTing metadata on documents, it only creates and serves it's own. It also doesn't support nested sections, but that shouldn't be too hard to add.</div><br /><div><span class="Apple-style-span" style="font-size:x-large;"><b>Wrap Up</b></span></div><br /><div>You can find the code at <a href="http://github.com/eedrummer/classy-hdata">eedrummer/classy-hdata on github</a>. Even if you aren't interested in hData, this application should serve as an example Sinatra/DataMapper application. If you dig into the code and the hData spec, I think you'll see that hData is really easy to implement, especially in a classy framework like Sinatra.</div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com27tag:blogger.com,1999:blog-5993287118648755621.post-41303137216325382872009-10-17T21:47:00.003-04:002009-10-17T22:50:00.135-04:00Teaching at the Ruby on Rails Workshop for WomenToday I taught the beginner class at <a href="http://blogs.law.harvard.edu/genderandtech/ruby-on-rails-workshop-for-women/">the Ruby on Rails Workshop for Women</a>. First of all, I want to thank <a href="http://twitter.com/mtolbert">Mary Tolbert</a> for getting me involved. Also, a big thanks to <a href="http://www.ultrasaurus.com/">Sarah Allen</a> and <a href="http://blogs.law.harvard.edu/lianaleahy/">Liana Leahy </a>putting things together and giving me a chance to say that "I taught a class at Harvard."<br /><br />I was a bit intimidated in teaching the class at first. I was upgraded from TA status to teacher mid-week and was a bit worried that I wouldn't know what to say. <a href="http://twitter.com/MegSecatore/status/4943625775">Maybe I was right to be worried</a>. Also, many of the TA's were rockstar rubyists and I have to admit that I felt a little silly speaking authoritatively in front of them.<br /><br />Once I started getting into the material I started to feel pretty comfortable. Watching people have a-ha moments is really rewarding. Trying to explain programming with out falling back on computer science terms is kinda tricky on the fly. Maybe if I do something like this again, I can speak on things a little more smoothly.<br /><br />It's pretty incredible how much the students were able to pick up in a day. I know that we had some people in the class that had never written code in their life and it was really cool to see them hacking in just a few hours.<br /><br />If I were to do it over again, I would suggest just a plain ruby class for beginners. Or maybe a two day class where Rails is covered in the second day. When I thought that the students were getting comfortable with Ruby, we only had one session left. I think that everyone was with me when we were modifying the default index page in Rails, but I wanted to make sure that the students saw the ruby bits too. So I rushed through controllers and views in 15 minutes and I would be surprised if anyone got anything out of it. It probably would have been best to punt on the controllers and just explain a little about HTML and CSS.<br /><br />I noticed that for a lot of beginners, application switching was killer. Switching between the terminal, browser and text editor is second nature for me, but not so much for those just getting started. I wish I had a ton of screen real estate to keep all three visible at once, but I don't know if that would actually help.<br /><br />We didn't get to really cover git or Heroku in the class. I was able to tell the students enough to use them, but not understand them. The localhost vs. Heroku server was definitely a stumbling block for some. For beginners, I might punt on Heroku and just work from localhost without any source control just to get started.<br /><br />I got a <a href="http://twitter.com/lleahy/status/4944508389">ton</a> <a href="http://twitter.com/karabee/status/4948933933">of</a> <a href="http://twitter.com/atrzop/status/4949055440">positive</a> <a href="http://twitter.com/jwunder/status/4943643978">feedback</a> on Twitter and in person. This was super encouraging. I've always been considering teaching at some level later in my career, so it seems like I am on the right track there.<br /><br />Overall, it was an awesome experience. One thing I was surprised to hear is how intimidating it is for women to participate in local open source meet ups. I suppose that being a 6' 2" person who has played contact sports leaves me in a position where I am generally not to scared of software developers. But getting up in front of the class today with all of the TA's looking at me, I think I'm starting to get it. Hopefully, events like these can start to even out the gender balance so that more women feel comfortable participating in the Ruby and FOSS community.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com45tag:blogger.com,1999:blog-5993287118648755621.post-15372161191810458142009-08-18T22:27:00.005-04:002009-08-18T22:45:00.944-04:00Building Tokyo Cabinet for use with Java on OS XI've been really interested in playing with <a href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a> lately. I thought that it would be fun to take a hack at the <a href="http://contest.github.com/">GitHub Contest</a> using <a href="http://www.scala-lang.org/">Scala</a> and Tokyo Cabinet. I then set out to build Tokyo Cabinet and its Java bindings (since I can call those easily from Scala). The Java bindings for Tokyo Cabinet are not pure Java, they use JNI, so you need to compile some C as well as Java. Everything looked fine and dandy until I tried to run some code. I then ran into this stack trace from Scala:<br /><br /><script src="http://gist.github.com/170118.js"></script><br /><br />To translate, what is going on here is that by default Tokyo Cabinet will build a 32 bit binaries. Java 1.6 on OS X is 64 bit and will look for a 64 bit version of the library. Here is what I did to make things happy.<br /><br />When running the configure script for Tokyo Cabinet itself, I added a flag:<br /><br /><script src="http://gist.github.com/170124.js"></script><br /><br />I tried the same trick when configuring the Java bindings, but it didn't seem to end up in the resulting Makefile. So I edited the Makefile by hand. In the end, my CFLAGS line looks like this:<br /><br /><script src="http://gist.github.com/170125.js"></script><br /><br />After that, I was able to get a small Scala script to create a Hash database.<br /><br />As an aside, the <a href="http://www.scala-lang.org/node/94">Scala IDE for Eclipse</a> seems really nice. I had tried it out a few months ago, and it has clearly made a lot of progress since then.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com44tag:blogger.com,1999:blog-5993287118648755621.post-82532084910341038642008-11-22T13:11:00.004-05:002008-12-02T22:14:09.932-05:00Glassfish is looking speedyLet me start by saying that most benchmarks should be taken with a grain of salt, so please do the same here.<br /><br />I work on <a href="http://laika.sourceforge.net/">Project Laika</a>, and for our deployments we are looking to switch to <a href="http://jruby.codehaus.org/">JRuby</a>. We already have java code hanging around, and it looks like we will need some more soon (I'm thinking it will be easier to deal with SOAP web services in java and calling the classes from JRuby). I wanted to run some numbers to make sure that our performance wouldn't fall through the floor.<br /><br /><span style="font-weight: bold;font-size:130%;" >The Setup</span><br /><br /><span style="font-size:100%;">I decided to pit Mongrel 1.1.5 running on MRI 1.8.6 against <a href="https://glassfish.dev.java.net/">Glassfish v3 Prelude 1</a> and JRuby 1.1.5 on Java 5. I'm running Rails 2.0.2 in both setups (I know we are behind the times). I ran them both on OS X 10.5.5 and had the Rails apps hit the same MySQL database.<br /><br />I used ab to grab the numbers. I had it hit the site 1000 times with 10 concurrent requests. I hacked the Laika app slightly so that you didn't have to log in.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">The Test</span></span><br /><br />I wanted to get a feel for how the app would perform, so I did two simple tests: dynamic content from a typical page and static content. The dynamic content was the patient template library in Laika which contains code that you'd expect in a Rails app: AR pulling info from the DB and putting it into ERB templates. I also pulled down the Rails 404 page to get a feel for serving static content. This is probably less meaningful, as you'll probably have Apache or Nginx serve up your static stuff.<br /><br /><span style="font-weight: bold;"><span style="font-size:130%;">The Results</span></span><br /><br />Glassfish, hands down. The results are the average number of milliseconds it took to serve each request.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-py9pRwVDXUMY-jmM_8f6M66jEUkUt8g6m3yRFS7zJQMvOqzTjyTYfc599mkFPkC5dDvOKbua2mg5p9NB5GNtZ3CVy2nB1pE8IwiGI78vGxKyTMdcvoQmwK0YL6cPSS-em6N-j9yUvsc/s1600-h/laika-benchmarks.png"><img style="cursor: pointer; width: 400px; height: 241px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-py9pRwVDXUMY-jmM_8f6M66jEUkUt8g6m3yRFS7zJQMvOqzTjyTYfc599mkFPkC5dDvOKbua2mg5p9NB5GNtZ3CVy2nB1pE8IwiGI78vGxKyTMdcvoQmwK0YL6cPSS-em6N-j9yUvsc/s400/laika-benchmarks.png" alt="" id="BLOGGER_PHOTO_ID_5271546187707260402" border="0" /></a><br /><br />It beat Mongrel in both static and dynamic content easily. Glassfish v3 also makes it ridiculously easy to deploy Rails apps. You can use the <a href="http://glassfishgem.rubyforge.org/">Glassfish gem</a>, and serve up your app with a single command. I installed the Glassfish server, so I could run JEE apps alongside my Rails stuff, there is a single command where you point the app server to the root directory of your app and you're done.<br /><br />With Rails 2.2 now out and offering thread safety, and JRuby being the only interperter that can take advantage of it... Glassfish and JRuby are really worth checking out.<br /></span>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com27tag:blogger.com,1999:blog-5993287118648755621.post-10214732093897470842008-09-05T22:16:00.003-04:002008-09-05T22:57:09.206-04:00SezHoo and WikiSym 2008I'm proud to announce the release of <a href="http://sezhoo.wiki.sourceforge.net/MainPage">SezHoo</a>! It's a tool we created at MITRE to help establish reputations for authors in wiki sites... specifically those run on <a href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> (the same software that runs Wikipedia).<br /><br />The way that SezHoo works is that it goes through a wiki article's history and tracks the authorship of each word in the article. We use a pretty popular technique in text analysis called <a href="http://en.wikipedia.org/wiki/W-shingling">shingling</a> to help us out.<br /><br />For each revision of an article, we break it up into shingles. We're using shingles that are 6 words long, so each word will have 6 shingles associated with it. We credit authorship of the word to the author of the earliest shingle.<br /><br />This allows us to add some information to a MediaWiki installation:<br /><br /><span style="font-size:180%;">Pages</span><br />We show the authors of a page, and the percentages at which they have contributed. We also offer this information at the section level.<br /><br />The information we compute is better than doing diffs with a standard MediaWiki install. Diffs are line ending based, so changing a word in a paragraph that has a single line ending can make it difficult to determine who wrote the paragraph. In addition, since we are looking for authorship of a word across all revisions at once (as opposed to a revision vs. revision comparison with a diff), this technique is immune to copying and pasting of text and reversion of vandalism.<br /><br /><span style="font-size:180%;">Authors</span><br />We create a reputation for authors with 3 dimensions: quantity, quality and value. For all metrics, we assign a 5 star rating. The way this is done is by computing a raw score for all authors and then ranking them. Authors in the top 20% have a 5-star rating in that dimension.<br /><br />For quantity, we simply calculate the number of word contributed by each author. This includes words that may no longer be in the current revision of the article. Value takes the percentage authorship of a page, and multiplies that by the number of page views (it's valuable if it's being read).<br /><br />Quality is a lot more tricky. We assume that wiki articles are a Darwinian environment. That is, quality text will "survive" in that as the article is edited, other authors will leave the text alone. Poor quality text will be "killed" or deleted by other authors. We determine quality by looking at the percentage of text that is expected to be alive for an author after 8 revisions (you can change 8 to whatever you like, but it works well for our internal wiki). To calculate the percentage alive after 8 revisions, we used a <a href="http://www.cancerguide.org/scurve_km.html">Kaplan-Meier</a> estimate. This was necessary due to the fact that authors will have phrases which are "alive" (they are in the current version of an article). So if a word has survived 3 edits and is in the current version of an article, when calculating survival probabilities you don't want to say that it "died" at 3 revisions, but you don't want to lose the data either. Kaplan-Meier handles that for us.<br /><br />In the end, you get a 5-star rating for each author in the dimensions of quantity, quality and value.<br /><br /><img src="http://www.wikisym.org/ws2008/images/2/29/WikiSym2008_banner-468x50.jpg" /><br /><br />I'll be presenting this work at <a href="http://www.wikisym.org/ws2008/index.php/Main_Page">WikiSym 2008</a>. We're doing some other interesting stuff which we haven't rolled into SezHoo yet, so it's worth coming to my talk even if you've read this blog post.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com35tag:blogger.com,1999:blog-5993287118648755621.post-22434333493369302872008-08-13T21:24:00.004-04:002008-08-13T22:22:16.457-04:00Using HadoopSince I've gotten back from OSCON, I've had a chance to use <a href="http://hadoop.apache.org/core/">Hadoop</a> at work. For those who aren't familiar with it, Hadoop is an open source framework for implementing <a href="http://labs.google.com/papers/mapreduce.html">map reduce</a> jobs.<br /><br />There are plenty of tutorials on Hadoop around the web, so I won't do any of the basic intro stuff. I wanted to write about some of the stuff I didn't find all that easily.<br /><br />Most of the Hadoop documentation talks about dealing with input where your records are on a single line (like an Apache access log). From using the Google/documentation/experience, we have found that Hadoop works just fine with multi-line records. We're using Hadoop to process XML, specifically a dump from Wikipedia. The dump is a single 33 GB file, where there is a root tag, and then several million child tags (representing Wikipedia pages). Using <a href="http://www.nabble.com/Re%3A-map-reduce-function-on-xml-string-p15835195.html">this code</a> I found on the Hadoop core user mailing list, we can have it so that the mapper gets the XML for one child node (or one Wikipedia page). This is nice, because the XML for a single page is relatively small. We then use JDOM to deal with the contents of the individual pages.<br /><br />We are using <a href="http://hadoop.apache.org/core/docs/r0.17.1/hdfs_design.html">HDFS</a> to store our input and output. By default, it will chop files into 64MB chunks, which get shipped to the mappers so that the file can be processed in parallel. One thing that I was concerned about was how records would be handled that spanned the splits. So far, we haven't seen any issues, and <a href="http://wiki.apache.org/hadoop/FAQ#23">this answer</a> in the Hadoop FAQ seems to indicate that the records spanning the split will be handled. It may be possible that those records get dropped, it would be hard for us to tell at this point... but the good news for us is that it won't affect the data we're trying to collect much.<br /><br />As for our setup and usage. We have 5 machines in our cluster, most of which are run of the mill dual core Intel machines running Linux. The jobs we're running on the 33GB XML file are taking around 45 minutes, which seems pretty fast to me.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com41tag:blogger.com,1999:blog-5993287118648755621.post-25330037490157004392008-07-28T21:21:00.004-04:002008-07-28T21:44:29.252-04:00OSCON 2008: Wrap UpSo I've returned from Portland after all of the OSCON activities. The conference was good, but I definitely didn't feel like it was a good as in years past. The keynotes were OK, but there weren't any that were spectacular. I hit a few sessions that were bombs, but I didn't get to one that rocked. Many were good, but nothing off the charts.<br /><br /><a href="http://hadoop.apache.org/core/">Hadoop</a> was big on Thursday. Derek Gottfrid of the New York Times talked about how they used Hadoop and Amazon EC2 to process tons of data. Derek's presentation style is great, which mate the talk entertaining. Some folks from Yahoo were also getting into the nitty gritty details of how the whole thing works too.<br /><br />The <a href="http://forge.mysql.com/wiki/MySQL_Proxy">MySQL Proxy</a> talk was good. It seems like a pretty handy tool for performance tuning and all sorts of SQL trickery.<br /><br />The last talk that stood out to me was <a href="http://selectricity.org/">Selectricity</a>. The project is a both a site to run elections as well as the software you can use to run elections where ever you want. One point that Benjamin Mako Hill made that I thought was interesting is that most election research goes into government elections... and these are the least likely to change. By building a tool to allow folks to conduct elections for simple things (what movie to see, who will lead the coffee club, etc.) using methods different from plurality, it's a good way to sneak in alternative voting methods to the masses. That way if people get familar with <a href="http://en.wikipedia.org/wiki/Condorcet_criterion">Condercet</a> when voting for the next American Idol, they may be more likely to push for election reform in government elections.<br /><br />I'm not sure if I'll hit OSCON again next year. I like going because it's nice to get an overview of a lot of different techologies, as opposed to something like Rails Conf. But things did feel pretty shallow this year.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com34tag:blogger.com,1999:blog-5993287118648755621.post-37318992434177713172008-07-24T00:50:00.003-04:002008-07-24T01:09:38.286-04:00OSCON 2008: Day 3Today, I gave my first OSCON talk on <a href="http://en.oreilly.com/oscon2008/public/schedule/detail/2889">Laika</a>. I think that the talk went pretty well. I had plenty of good questions from the audience, and I think I may have been able to snag a few people who were interested in contributing.<br /><br />The speaking experience was pretty cool. I was in a fairly small room, and probably had about 20 to 30 people in the audience, which was pretty non-threatening. I would have been more nervous in one of the more cavernous rooms with 200 people.<br /><br />As for the rest of the conference today...<br /><br />XMPP has a lot of buzz for communicating in the cloud. There were a few talks on that today.<br /><br />There is a lot of Ruby stuff going on outside of web apps. <a href="http://rad.rubyforge.org/">RAD</a> seems to have a lot of buzz for using Ruby to work with Arduino. Ruby's also behind <a href="http://adhearsion.com/">Adhearsion</a>, a tool for building IVR's.<br /><br />I missed the talk on <a href="http://incubator.apache.org/couchdb/">CouchDB</a>, but some of the folks I'm out here with said it was great.<br /><br />On a conference logistics note... I was kinda bummed that some of the talks had filled, so I couldn't get in. I wound up missing the talk on Hypertable as well as one on XMPP in the cloud.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com21tag:blogger.com,1999:blog-5993287118648755621.post-9497706916965988862008-07-23T01:50:00.002-04:002008-07-23T02:20:51.372-04:00OSCON 2008: Day 2Day 2 at OSCON.... Some of the highlights...<br /><br /><a href="http://en.oreilly.com/oscon2008/public/schedule/detail/3373">Practical Erlang Programming</a> was great. Francesco Cesarini is a great speaker and delivered a great tutorial. While Erlang does make you look at things differently, I can see how it makes it a lot easier to write concurrent code.<br /><br />While I was physched to see Mark Shuttleworth give a keynote (given my fondness for Ubuntu), the best keynotes tonight were definitley Robert (r0ml) Lefkowitz and Damian Conway.<br /><br />R0ml's talk compared various software development methodologies to Quintilian's 1st century works on rhetoric. My take on his talk was that open software has a good development methodology since it doesn't really have a requirements phase. Code gets released early an often. Bugs are filed and patches are submitted. Then users and developers can look at bugs and patches are there to determine what is in the next release. This is different form a typical development methodology, where you need to decide what you want up front. In this model, people do what they want, and you take what you like in the end.<br /><br />On the other hand Damian Conway is somewhere between insane and brilliant. His talks are hillarious, but the stuff that he is actually able to implement is crazy... I'm sure that we'll be seeing some talk of <a href="http://flickr.com/photos/gunnarwolf/1343377956/">positronic variables</a> on the tubes in the comming days.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com32tag:blogger.com,1999:blog-5993287118648755621.post-84426426424801587232008-07-22T01:27:00.002-04:002008-07-22T02:03:00.418-04:00OSCON 2008: Day 1The first day at OSCON 2008 has come and gone... This year looks to be another good one. Here's what I saw on my first day.<br /><br />The first session I went to was <a href="http://en.oreilly.com/oscon2008/public/schedule/detail/2488">Python in 3 Hours</a>. While I do most of my work in Ruby, I do try to keep an eye on Python. It seems like a pretty clean scripting language, and quite speedy when compared to Ruby.<br /><br />The tutorial was good. The material is kinda dry (it's language syntax after all, which is pretty hard to spice up). Steve Holden's presentation was clear and well thought out. I walked away feeling like I could approach Python code now without too much fear. However, I still have some pretty mixed feelings about Python... There are a lot of little things that bother me... having to add a self parameter to instance methods, double under bar naming conventions and the whole significant white space thing. At any rate, I though the tutorial was informative.<br /><br />The second tutorial I did was <a href="http://en.oreilly.com/oscon2008/public/schedule/detail/4493">Making Things Blink: An Introduction to Arduino</a>. This was a lot of fun. I haven't played with a microcontroller since college... but I've always loved working at the place where software meets hardware.<br /><br />In the session, we worked through coding for Arduino as well as some basic circuits. The class culminated by building an Etch-a-Sketch. This is accomplished by hooking up two potentiometers to the Arduino, which reads the values and passes them to your computer via USB. We then used <a href="http://processing.org/">Processing</a> to read and visualize the data on the screen. This meant that you could turns the knobs on the pots and draw on the screen pretty cool stuff.<br /><br />Overall, one of the vibes I'm getting from the conference this year is big data. How do deal with really big databases and how to process tons of data in parallel. We'll see if this continues throughout the conference.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com18tag:blogger.com,1999:blog-5993287118648755621.post-90969621546261357102008-06-05T22:17:00.002-04:002008-06-05T22:20:08.808-04:00Speaking at OSCONI'll be speaking about the work I'm doing on <a href="http://en.oreilly.com/oscon2008/public/schedule/detail/2889">Project Laika at OSCON</a>. Come check out the session to see how we're using rails to test electronic health record systems<br /><br /><a href="http://conferences.oreilly.com/oscon"><br /><img src="http://conferences.oreillynet.com/banners/oscon/speaker/oscon2008_banner_speaker_210x60.gif" width="210" height="60" border="0" alt="OSCON 2008" title="OSCON 2008" /><br /></a>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com22tag:blogger.com,1999:blog-5993287118648755621.post-3065109308095971032008-05-21T22:10:00.004-04:002008-05-21T22:31:52.428-04:00DBSlayer is taking JSON to the next levelI ran across <a href="http://code.nytimes.com/projects/dbslayer">DBSlayer</a> on the <a href="http://railsenvy.com/">Rails Envy</a> podcast and I was struck by a number of things. One, is that it is really cool that the New York Times has their own <a href="http://code.nytimes.com/">open code repository</a>. Another is how far JSON is coming as the choice format for data exchange.<div><br /></div><div>I remember reading a few posts on the tubes months ago that JSON was going to give XML a run for its money and I thought folks were crazy. I don't see JSON totally replacing XML, but I'm definitely seeing it used in many more places where XML would have been the natural choice a couple a of years ago.</div><div><br /></div><div>DBSlayer also intrigues me because I can see writing some pretty interesting web apps without a traditional app server. I know that DBSlayer is intended to help with scaling your DB layer, but I think it would be cool to hit it directly from the browser. There would be problems with this approach... You'd pretty much only be able to build read only apps where you don't care who sees the data... but I can see plenty of apps fitting that mold (corporate directories, stock data, sports data, etc.)</div>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com25tag:blogger.com,1999:blog-5993287118648755621.post-36993006405611278022007-12-07T20:19:00.000-05:002007-12-07T20:50:43.659-05:00Spring MVC, Data Binding and JPAA <a href="http://laika.sourceforge.net/">project</a> that I am working on is using Spring MVC. Naturally, we would like to use its data binding facilities to handle data coming in from HttpRequests into our beans. As mentioned by Matt Fleming in this <a href="http://mattfleming.com/node/134">blog post</a>, the Spring docs don't do a great job of explaining how to deal with data binding when the bean in question has a collection as a property. Something like this:<br /><pre><br />public class Person<br />{<br /> private String name;<br /> private List<Address> addresses;<br />}<br /></pre><br />As Matt mentions, using <a href="http://commons.apache.org/collections/api-release/org/apache/commons/collections/list/LazyList.html">LazyList</a> found in the <a href="http://commons.apache.org/collections/">Apache Commons Collections</a> is one way to get Spring to bind dynamic collections into your beans (in the example, use a lazy list for the addresses). However, you can run into some issues if your persisting your beans using JPA.<br /><br />We're using <a href="http://openjpa.apache.org/">OpenJPA</a>, and when we try to persist an object that has a LazyList in one of its properties, it chokes. The problem is that it can't proxy LazyList because it doesn't have a default constructor. To get around this, we're applying a pretty simple hack. Before Spring does the data binding, we decorate the collection with a LazyList. Before we persist the object with JPA, we copy the collection back to a plain old ArrayList.<br /><br />An example of doing this can be seen with Spring MVC's SimpleFormController in <a href="http://laika.svn.sourceforge.net/viewvc/*checkout*/laika/trunk/web/src/main/java/org/projectlaika/web/DocumentLocationCreationController.java">DocumentLocationCreationController</a>. To help out with the whole process, we've created a <a href="http://laika.svn.sourceforge.net/viewvc/*checkout*/laika/trunk/web/src/main/java/org/projectlaika/web/util/LazyListHelper.java">LazyListHelper</a> to make the decorating and copying easier.<br /><br />If you're using Spring MVC with JPA, hopefully this will save you some headaches.Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com21tag:blogger.com,1999:blog-5993287118648755621.post-59301951464305956532007-10-04T11:39:00.000-04:002007-10-04T14:54:03.517-04:00JRuby Compiler Performance<p>I was pretty excited when I heard that <a href="http://headius.blogspot.com/2007/09/compiler-is-complete.html">JRuby's compiler was complete</a>. I figured I could run some benchmarks against the C-Based Ruby Implementation to see how it was performing. I've only run this suite once, but I hope to provide enough info so that you could replicate the results yourself. Please let me know if you think I missed something.</p><br /><span class="Apple-style-span" style="font-size:large;"><span class="Apple-style-span" style="font-weight: bold;">The Setup</span></span><br /><p>The benchmarks were run on a Dell D820 laptop. It has an Intel Core Duo running at 2.17GHz and has 1GB of RAM. I'm running Ubuntu 7.04 for an OS. The only wrinkle in my setup is that my root partition is encrypted using LUKS/dmcrypt. This will probably slow down the IO benchmarks, but I'm assuming it would penalize both Ruby implementations equally.</p><br /><p>For the C-Based Ruby Implementation, I'm using the Ubuntu packaged Ruby version 1.8.5. For JRuby, I checked out the latest from <a href="http://svn.codehaus.org/jruby/trunk/jruby/">Codehaus's SVN</a> (Rev 4474) and built the code using Sun's Java version 1.6.0.</p><br /><p>For the benchmarks, I pulled the suite from the <a href="http://svn.ruby-lang.org/repos/ruby/trunk/benchmark/">Ruby Lang SVN</a> (Rev 13608). </p><br /><p>I ran the benchmarks using the run.rb script. The C-Based Ruby Implementation was run with no command line options. I modified the script to run JRuby withe the following options:</p><ul id=""><li>-C to compile the ruby code before running it</li><li>-O to disable ObjectSpace</li><li>-J-Djruby.thread.pooling=true to enable thread pooling</li><li>-J-server to put the JVM in server mode</li></ul><br /><p>These options are explained in the <a href="http://www.headius.com/jrubywiki/index.php/Performance_Tuning">Performance Tuning</a> and <a href="http://www.headius.com/jrubywiki/index.php/JRuby_Compiler">Compiler</a> pages of the JRuby Wiki.</p><br /><span class="Apple-style-span" style="font-weight: bold;"> </span><div><span class="Apple-style-span" style="font-weight: bold;"> </span></div><div><span class="Apple-style-span" style="font-weight: bold;"> </span></div><div><span class="Apple-style-span" style=" font-weight: bold;font-size:large;">The Results</span></div><div> </div><div> </div><br /><p>The following shows the time it took the Ruby Implementation to perform the benchmark in seconds:</p><br /><a href="http://home.comcast.net/~a.gregorowicz/img/jruby-vs-cruby.png"><img src="http://home.comcast.net/~a.gregorowicz/img/jruby-vs-cruby.png" width="363" height="1167"/></a><br /><p>Here is the difference between the two implementations (CRuby - JRuby) again, in seconds:</p><br /><a href="http://home.comcast.net/~a.gregorowicz/img/cruby-jruby.png"><img src="http://home.comcast.net/~a.gregorowicz/img/cruby-jruby.png" width="368" height="216"/></a><br /><p>As you can see, JRuby is performing really well on a lot of these benchmarks. It gets killed on eval, but I suppose that is to be expected. My guess is that JRuby is taking advantage of Java primitives to outperform CRuby at number crunching. Charles Nutter's comments at <a href="http://rawblock.blogspot.com/2007/06/ruby-vs-jruby-fractal-benchmark.html">another blog entry on JRuby benchmarking</a> seem to indicate that this would be the case. Can't wait until JRuby is 1.1</p>Anonymoushttp://www.blogger.com/profile/07819081702916184088noreply@blogger.com19