(cons 'geek 'culture)

Cutting edge culture for geeks...

Monday, August 6, 2007

How to Build and Maintain a Small Beowulf Cluster

Over the summer I have managed to get my very own three node Beowulf cluster up and running. I found that there were many so called "help" sites and mailing lists. However none of them had what I really wanted: a concise point by point guide containing workarounds for maintaince issues and set up issues I might encounter. That is what I am setting out to do here. For further reading, you will want to check out: http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php (the Beowulf Book) as well as http://beowulf.org/ (the main Beowulf site).

One last note before we begin: I will attempt to make this guide as general as I possibly can, however, it will contain some tidbits specifically for SPARC users. With that, let's get started!

A Brief History of Beowulfery

Supercomputing seems to be all the rage these days. In the last decade the Personal Computing has soared to limits unimaginable by the fathers of computing. This is mainly due to the fact that Moore's Law still seems to be in effect with no slow down in near sight. This essentially means that building your very own supercomputer may be far cheaper than you think. Virtually all major systems on the Top 500 list (http://www.top500.com/) employ some sort of clustering. This is because it is cheaper nowadays to purchase several "commodity" computers and utilize them in a cluster to produce the compute power needed.

The Next Step

So you think you want to build your very own Beowulf eh? Be warned: this is still not a task for the timid, but help is readily available via the Internet (see the links above). My cluster currently consists of three Sun Microsystems Ultra 10s (440MHZ UltraSPARC IIi, 512MB RAM ea., 9.1GB HDD ea.). Whatever you use, keep in mind that a in a "true" Beowulf cluster all of the nodes have the same architecture (as well as same HDD size, RAM amount, etc).

The next thing to resolve is a place to put these nodes. In this guide I will assume you have a fairly small cluster (16 nodes or less). Please note that some of these techniques may not scale well to larger clusters. In my case, with only three nodes, I can keep them practically anywhere.

After you have scouted out your location, you will need to invest in a switch (the faster the better, if you can afford gigabit do it!). I use a Dlink DSH-16 10/100 Hub with Built in Switch. You may also want to purchase a UPS (I have not, mainly because of the small amount of nodes I have, and the fact that my uptime is not crucial).

If you have several nodes, you may also want to invest in a KVM. This is really only practical if you have many nodes, or can find a small KVM cheap (most small PS/2 KVMS are under $100). I do not have a KVM because the cheapest model that supports Sun DINs is $740. There is also another Sun specific issue I should mention here: if you don't have a Sun keyboard plugged in at boot, OpenBoot will dump you to a serial output. This means that in order to boot correctly into Linux, you will need to plug in a keyboard. I am still looking for a workaround to this.

Installing Linux

The next thing to do is install the OS on every node. For the Ultra 10s, Debian was my distro of choice. I chose Debian because it is an extremely stable distro, and it is trivial to set up and maintain. You will also find that on older systems Debian may be much easier to install than other distros.

I used Debian Etch's net install disk (http://www.us.debian.org/CD/netinst/). When you install, you need to make sure that each node has all the same software installed. What follows is a minimal list:

  1. GCC/G++
  2. Python
  3. All NFS packages
  4. All NTP packages

as well as anything else you might need/want. Try to make sure each HDD has the same amount of used space (this comes into play when installing and using NFS).

You will also want to make sure every node has a static IP address. In Debian, edit the /etc/hosts file and /etc/network/interfaces file with the static IP addresses you choose. I used 192.168.0.100 and up.

Installing NFS

The main point of a cluster is to use all of the nodes effectively to complete a computing task in far less time than it would take for just one of these nodes to complete on their own. The easiest way to share information over a network is with NFS. Note that this is not the best way, merely the easiest, if you are serious about clustering you should probably start using OpenAFS (http://www.openafs.org/) or Lustre (http://www.lustre.org/) after you get the hang of maintain and using your cluster with NFS.

The main issue with NFS is that it is insecure on clusters because you have to use the no_root_squash option in /etc/exports. When I set up NFS I used this (http://nfs.sourceforge.net/nfs-howto/ar01s03.html) and this (http://www.crazysquirrel.com/computing/debian/servers/nfs.jspx). It was actually much easier than I had first anticipated.

Before you continue, make sure you can ping and ssh into every node in the cluster!

I decided to share the /home and /usr/local. My /etc/exports looks like this:

/usr/local 192.168.0.101(rw,no_root_squash) 192.168.0.102(rw,no_root_squash)
/home 192.168.0.101(rw,no_root_squash) 192.168.0.102(rw,no_root_squash)

Believe it or not, you are almost done! The only thing left to do is edit the client nodes' /etc/fstab by placing a line in each for each NFS exported directory. Below is the relevent portion of /etc/fstab for one of my nodes.

192.168.0.100:/home /home nfs rw 0 0

192.168.0.100:/user/local /usr/local nfs rw 0 0

This will ensure that the NFS shared directories are mounted at boot up.

Last but not least you need to start up NFS:

/etc/init.d/portmap start
/etc/init.d/nfs-kernel-server start
/etc/init.d/nfs-common start

Make sure to do a ps -A and look for nfs (or nfsd) and portmap to be running. For added security, you many want to add rules to /etc/hosts.allow and /etc/hosts.deny that only allow mounted NFS directories on your local network (if you are isolated this is not an issue).

What's Next?

That' s it! You now have a fully functional cluster, next you will want to install clustering utilities such as OpenMPI, UPC, and MOSIX. In addition, you will want to generate a common SSH key to allow access to all the nodes without having to enter a password.

Labels: , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home