Monday, 22 December 2008

Installing MPI

MPI stands for Message Passing Interface. When different instances of a process -usually running on different nodes - need to talk to one another, they do so using MPI. It has become a sort of de facto standard.

The are various implimentations of MPI. I'm using MPICH2. Building and installing MPICH2 on the head node is no problem - just follow the instructions in the "From A Standing Start to Running an MPI Program" section of the Installer's Guide. The only thing you need to keep in mind is that you will need to replicate the install on all the nodes. For this reason I installed to /opt/mpich2-install via configure -prefix. I could then copy the mpich2-install directory to the mounted VNFS image:

perceus vnfs mount centos-5.1-1.stateless.x86_64
cp -r /opt/mpich2-install /mnt/centos-5.1-1.stateless.x86_64/opt/
perceus vnfs umount centos-5.1-1.stateless.x86_64

The key to getting mpi working, however, is to be able to ssh onto a node from the head node, and onto the head node from a node, without needing to enter a password. You must be able to ssh both ways. Happily, Perceus sets up the connection from the head node to the node, so you don't have to do anything. But to ssh onto the head node from a node - without needing a password - you have to do some work. On the node:

[root@node ~]#ssh-keygen -t rsa
[root@node ~]#cat .ssh/id_rsa.pub ssh root@head_node 'cat >> .ssh/authorized_keys'

This generates a private/public key pair for root on the node and copies the public key to the head node so that root will no longer need to enter a password when using ssh to connect from the node. That's fine, except the cluster nodes are stateless - they don't have their own harddrives - so the next time the node reboots the configuration will be lost. The solution is to copy the keys and ssh settings back to the node VNFS image. If the VNFS image is mounted, you can do this:

[root@node ~]#scp -r ./.ssh head_node:/mnt/centos-5.1-1.stateless.x86_64/root

Keep in mind that a .ssh directory probably already exists, so you might want to get it out of the way first. But that's it - job done!

There a few things worth noting here. You only have to generate the public/private keys once. The same keys work for all nodes - they are not related to the host name of the node (which I thought they might be.) This also means that the keys work regardless of what nic your mpi traffic is going over. (More of which later.) On mpi itself, all you have to do is make sure that the files are in the same place on each system, and it does the rest. Very smart. After that it is just a case of starting mpi on each node:

mpdboot -n 5 -f mpd.hosts

Running mpdtrace should now show you the names of your nodes.

No comments: