Add Computation to the Mix: Deploying HPC Clusters on EC2 with Globus Provision

September 13, 2011   |  Borja Sotomayor

Picture this: You just got access to an awesome dataset, and you can't wait to start tinkering with it. Maybe it's stored in some remote observatory, or maybe it's right there in your laptop. Wherever it is, Globus Online can help you move it somewhere with serious computational muscle, so you can start hacking away at those oodles of data.

Except... you don't actually have an account on one of those fancy clusters or supercomputers.

Or maybe you do, but then your jobs will have to wait in a queue, and what you really want to do is just play around with all that wonderful data with more than just your laptop (or those lab servers in the corner of your office, precariously held together with duct tape). After you've had a chance to explore the data and polish up your code, then it might make sense to send off a gazillion jobs to a cluster, and wait patiently for the result. For now, you just need a quick and easy way to conjure up enough resources to start doing some science with your data. Right. Now.

That's where Globus Provision comes in: a tool we recently released that will deploy fully-configured Globus systems for you on Amazon EC2. Besides deploying common Globus services, Globus Provision can also deploy a Condor compute cluster for you  (we will soon be adding support for Apache Hadoop too). All you need is an Amazon AWS account and, in a few simple steps, you will be able to deploy compute clusters on EC2 in just minutes. Plus, if you don't have an Amazon AWS account, you may be able to take advantage of their Free Usage Tier and get 750 hours on EC2 to play around with, completely free. We are also working on adding support for "EC2-ish" clouds (such as Eucalyptus, OpenStack, etc.) so you can deploy clusters on your organization's private cloud.

The latest version of Globus Provision, released this week, also includes support for Globus Online, allowing you to easily attach a Globus Online endpoint to your cluster without having to worry about setting up a GridFTP server, requesting certificates, etc. This means that, once Globus Provision has prepared a cluster for you, you can use Globus Online to transfer your data into the cluster and start working with it right away.

If you're only going to use the cluster sporadically to experiment with your data, Globus Provision also allows you to suspend it and resume it at a later time (which means you won't pay for the EC2 instances --only for the comparatively cheap cost of storing your cluster's disk images). You can also dynamically add and remove nodes to a running cluster, to meet your demands at the time.

You should be able to follow this guide even if you have no prior experience with Globus Provision. Of course, if you do want to learn more about Globus Provision, we encourage you to take a look at our extensive documentation.</p>