Remote graphical access to Unix machines

As far as I can tell, these are the options:

  • The venerable VNC, which basically compresses a screenshot of the entire desktop over and over again and is consequently pretty laggy. The project has forked into RealVNC, TightVNC, and UltraVNC, all of which are more or less interoperable.
  • Straight X11 protocol over SSH, which is unusably slow at least on my setup.
  • NX, a compressed X11 protocol. Commercial software, but the core compression library is open source, and there’s a free trial of the complete system. I’ve tested this on Ubuntu 10.10 and it works great.
  • NeatX, a free clone of the NX server using the original libraries.
  • FreeNX, a free clone of the NX server and client using the original libraries. I tried for a few hours to get this working but it kept crashing on Ubuntu 10.10.
  • x2go, a different protocol based on those same free libraries from NX. No Windows client, but it’s easy to install. I’m using it a lot these days and it works OK. It’s slightly slower than NX and occasionally crashes, but it is free.

It’s still way too hard to do this. Sometimes you really need graphical access!

Posted in Uncategorized | Leave a comment

What’s the difference between spot and on-demand EC2 instances?

EC2 allows you to rent VPS instances by the hour at either a high fixed rate (called “on-demand” instances) or a much lower rate determined by auction that comes with fewer features (called “spot” instances). However, Amazon’s documentation is horribly obfuscatory when it comes to which features you’re giving up by going with spot instances.

Here is what you give up by going with spot instances:

  • Your instance is terminated without warning if the spot price goes above your bid. This isn’t a huge issue, since even if you bid very high you’re still saving money on average by using spot instances.
  • On-demand instances can be “stopped” (turned off while keeping all attached disks alive) or “terminated” (turned off while deleting all attached disks). Spot instances can only be “terminated.” However, you can use ec2-modify-instance-attribute to set the delete-on-terminate flag to false for all attached disks — effectively you can turn a terminate into a stop. So this is not a true limitation.
  • You have little control over which availability zone (AZ) your spot instance is started in. If you specify an availability zone in a spot instance request, it may take much longer to start your instance — sometimes days! So all you can do is start your spot instance without specifying an AZ, and EC2 will put it in whichever one has room. This is a problem because volumes are only available in a single AZ.

I use EC2 mostly to run a single instance for long-running computations (my netbook just can’t hack it for running my research code, and my desktop is powerful but it’s at home). For my use case, it turns out that spot instances can be turned into on-demand instances. Here’s how it works:

  • On shutdown, I change the instance attributes so the disks aren’t destroyed, then make a snapshot of the disk.
  • On startup, I make an AMI that refers to the snapshot as the root volume, then request a spot instance for that AMI. This neatly avoids the problem of not being able to access volumes in a different AZ.

By doing this, I’m paying $0.14/hr for an m1.large instance instead of $0.34/hr. It costs around $0.08/hr just to in electricity to keep my desktop running, so EC2 isn’t exactly a burden. On days when I’m doing research software development, I just turn it on in the morning and leave it on all day.

Posted in Uncategorized | Leave a comment

PCA of test scores

Here’s a principal component analysis biplot of test scores in a class I’m TAing:

The test had 15 problems. Each dot represents one student score, while the 15 vectors starting with V are problems on the test rotated into the coordinate system defined by the principal components. Better scores are in the top-left (signs are arbitrary).

There are several interesting things to note here. Questions 3, 5, 10 and 11 are almost exactly parallel: these questions turn out to be short answer programming questions. Questions 7 and 8 are not aligned with the bulk of the questions: they turn out to be low-value multiple-choice questions that many students probably skipped. In general, it looks like component 1 indicates a students’ ability to perform the core tasks on the test, while component 2 indicates whether they got 7 and 8 right. There’s a roughly bipartite distribution on component 2, which is just because these are all-or-nothing questions — when I removed these two questions and re-ran the analysis, the distribution was mostly uniform.

Question 14 was a long-form programming question that took about 1/3rd of the time on the test. It’s pretty much parallel to the rest of the questions. We were worried that students would spend time on 14 at the expense of other questions, so it is reassuring that they don’t seem to do so.

This seems to be a pretty good test, although some of the questions are basically redundant so it could be made shorter. There is no evidence that there are different underlying skills being tested; instead, it seems there is a single underlying “computer science skill” that correlates well to test performance.

In order to preserve a degree of secrecy, I renumbered problems and applied some random jitter to scores so it is impossible to identify individual students.

Posted in Uncategorized | Tagged | 1 Comment

Google Talk + Acer 1410 + Linux = problems

Some HDA sound chips, eg. the one in my laptop (Intel 82801I in my Acer 1410), report that they have a single 2-channel audio input when they really have 2 1-channel inputs (a microphone and a mono input jack). This kind of thing can sometimes be fixed by playing with audio profiles or passing different options to the snd_hda_intel kernel module, but for my machine the only way I found to get acceptable audio quality at decent volume is to turn off one of the channels completely using pavumeter.

Unfortunately, both Google Talk and Skype force the mixer to set evenly balanced stereo input, which causes awful audio input quality, no audio, weird clicks and feedback, and such. The fix for Skype is easy and well documented: turn off “automatic volume adjustment” in the settings. The fix for Google Talk is to make a file in ~/.config/google-googletalkplugin/options with the single line audio-flags=1 . This is completely undocumented and took me way too long to figure out. Hope it helps someone out there.

Posted in Uncategorized | Leave a comment

Trac shows blank page in Chrome

I use the excellent Trac project management system to host a couple of small projects. For as long as I’ve been using Trac, Chrome and Safari users sometimes get a completely blank page where Trac should appear. I had just been telling them to use Firefox or IE, but I decided to get to the bottom of the issue today.

This turned out to be harder than I expected. The issue didn’t occur with the demo Trac site, or with the Trac site for Webkit. Chrome’s Inspector showed an error but no useful debugging information, and nothing appeared in the server error logs. After a few false starts I tried connecting with wget, which worked fine normally but gave a connection error when using the same headers sent by Chrome (gleaned from Chrome’s Inspector). I decided to remove headers one by one until wget worked again.

When I removed the Accept-encoding: gzip header, wget stopped reporting a connection error. After extensive Googling, I discovered that Apache’s mod_fastcgi returns an incorrect Content-length header when gzip compression is enabled. Firefox and IE handle this gracefully, but Chrome and Safari abort the connection. The bug is apparently fixed in the mod_fastcgi source, but not in Ubuntu 10.10.

The answer is to compile mod_fastcgi yourself, or disable mod_deflate (preventing any compression of HTTP requests), or swap out mod_fastcgi for mod_fcgid.

The issue does not appear for the WebKit Trac site because that site uses the venerable mod_python instead of mod_fastcgi, and doesn’t appear for the Trac demo site because that server uses Lighttpd. It probably does appear for lots of other sites since the Trac Guide suggests using mod_fastcgi instead of mod_fcgid for FastCGI support.

Googling for this solution is really tough, since so many sites show bugs in other software using Trac, rather than bugs in Trac itself. Hopefully this post will come up in someone’s search results and make it a little easier to find.

Posted in Uncategorized | Leave a comment

On securing web apps

I recently wrote a little to-do list/time manager app using jQuery. I was impressed at how easy it was to put together this app quickly — I had it done in a weekend, with multiple menus, some custom widgets, about 10 different JSON calls, etc. I’m thinking about taking the app public, and since to-do lists tend to be private I want to make it as secure as possible. This is harder than I thought.

The app communicates with the server by sending commands to a small FastCGI written in Python. The server side is minimal and just knows how to run a couple of SQL queries. All the processing happens on the client side.

The #1 concern right now is preventing CSRF. In these attacks, the browser of a user who uses both my app and an evil site is directed to make a request to my app and send the results to the evil site. Some of the standard ways to prevent this won’t work:

  • I can’t use POST requests only, since JSONP requires GET
  • I can’t use a synchronized token that’s updated on each request (like a bank web site), since in my app sometimes multiple requests are in flight at once
  • I can’t check the Referer header since that isn’t always sent over HTTPS

I may have to use the CSRF-prevention-of-last-resort: creating a session key that’s sent both as a cookie and as a GET parameter, and checking that the two match up. This method is somewhat problematic because the session key has to be changed frequently (how frequently? who knows!), it leaks session keys in to the server logs so those have to be kept secure too, and I can’t use any offsite resources like AddThis or Google Analytics.

As the to-do list accepts text from the client, I have to think about potential SQL, XSS, and JSON injection vulnerabilities. The first two are easy to prevent, but it doesn’t seem JSON injection is really on the radar yet so there aren’t good guides on how to counter it.

I’m using OpenID to handle authentication through mod_auth_openid. Unfortunately I’m not at all confident that mod_auth_openid is secure (I’ve already fixed a few bugs). Do I risk it or switch away from OpenID?

While it was remarkably easy for me to build the app, it’s surprisingly hard to get it ready for public consumption. The interconnections between seemingly orthogonal technologies are remarkably complex: HTTPS+JSONP=no good way to stop CSRF! I’m sure Google and Facebook have a team of crack engineers checking their apps for these problems, but how many smaller shops get all this stuff right?

Posted in Uncategorized | 1 Comment

Startup Weekend

Over the weekend I participated in Startup Weekend Twin Cities. If you’re not familiar with it, Startup Weekend attendees come up with the idea for an online venture on a Friday night, split into teams to carry out the best ideas over the weekend, and present their work in front of a panel of judges on Sunday night.

The pace of the event was incredible. We decided to make an Amazon Wishlist-style application for Facebook, and even before going to bed on Friday night we had a basic Facebook application that could search for and add products from Amazon’s API. By the end of Saturday we had the database backend implemented, and by midday Sunday we had a working application with more Facebook integration. We were getting affiliate payments by 3 on Sunday.

In the past I’ve mostly done solo projects, but the Startup Weekend style has some great advantages:

  • Immediate access to experts. Our team had three developers, a marketing expert with an MBA, a corporate lawyer, and a successful CEO on hand. We were able to instant advice from members of other teams, ranging from Apache configuration issues to search engine keyword optimization. It’s hard to imagine ever having that kind of access without being part of a larger corporation or a startup incubator.
  • Rapid, honest feedback. I say “honest” for a reason: when you ask an outsider if an idea is good or not, they say “that’s fantastic” 99% of the time. A panel of judges with prizes to give out is much more critical.
  • Smaller investment than a startup. If you spend a year in a basement secretly coding up the next killer app, you might get fantastically wealthy. But you might get scooped by a larger company, end up with a project that users don’t need, or get distracted by life’s many uncertainties. If you fail to finish Startup Weekend, at least you’ve only lost a weekend of work.
  • Better for networking than attending a conference. If you go to conferences and seminars a lot, you quickly discover that many of the attendees have no real investment in the subject area and are just passively building a network. Some conferences try to fix this by charging enormous entry fees or hosting in an exotic location, but I can’t afford to go to those kinds of conferences! Startup Weekend costs just $75, but the requirement of 50 hours of focused participation keeps out the slackers.
  • Amazing mentorship opportunities. I can’t imagine any other way I would work closely with a developer who sold his systems management software to Dell, or end up having a beer on Sunday night with a partner in one of the largest law firms in the Twin Cities.

The event was only marred by the horrendous ice storm on Saturday night. I ended up walking back and forth 3 miles from my house to the U, which was only possible because I had a pair of boots with cleats. I even ended up lending out the cleats to others who were stuck inside the building because the sidewalk was so slippery!

We got second place overall. I’d be first to admit that our pitch wasn’t the most original we saw, but we filled a niche and the fact that we had working software before the deadline carried a lot of weight with the judges. I told Clint Nelson, the event organizer, that I learned more in 50 hours of Startup Weekend than in many of my 14-week, $4000 graduate school classes.

Many congratulations to the winners, Dueling Dates, and the organizers of Startup Weekend! You can see all the final presentations at tech.mn. And don’t forget to try our product!

Posted in Projects | 3 Comments