Tuesday, 23 December 2008

Show us yer FLOPS!

What's the first thing you want to know about your cluster? Or anyone else's cluster, come to that? You want to know how fast it is, right? The way of measuring cluster performance is to count the FLOPS - the FLoating point Operations Per Second - the cluster is capable of executing. Counting FLOPS is the biggest pissing competition in the world of computing. Right now the cluster that can piss higher up the wall than any other is RoadRunner, capable of 1.7 petaflops (that's a 1 with 15 noughts - a million, billion FLOPS, a million gigaflops.)

The way to measure FLOPS is to run the High Performance Computing Linpack Benchmark - HPL. You can download HPL from Netlib. To build HPL you need two more things - mpi and a BLAS library. BLAS stands for Basic Linear Algebra Subroutines. GotoBLAS is recognized as being a fast implimentation of BLAS, so I downloaded that. I ran the GotoBLAS quickbuild.64bit script and everything seemed OK, so I left it there.

It is probably worth mentioning that HPL can use Fortran 77, so I installed the g77 "compat-gcc-34-g77-3.4.6-4" package.

Building HPL is a case of creating a make file for your architecture. Fortunately, you can just edit one of the default files in the hpl "setup" sub directory. I used the Make.Linux_PII_FBLAS file and set the MPI directory as follows:

MPdir = /opt/mpich2-install

and the Linear Algebra library like this:

LAdir = /home/David/Sources/GotoBLAS

Then its just a case of calling make specifying the right architecture:

make arch=Linux_PII

(Obviously the make file needs to match.)

The file you need to start benchmarking is xhpl in the (in my case) hpl/bin/Linux_PII_FBLAS/ sub directory. We need to run this with mpirun. However, there's an issue. In order to run an application across multiple nodes, that application's binary file (and any supporting libraries) needs to be on each node. Rebuilding a node image every time we want to run a new application is obviously out of the question, so what do we do? Fortunately Perceus can come to our aid.

Perceus supports what is known as "Hybridization". Hybridization is essentially file sharing. The idea is that files or folders in the VNFS image are replaced by symbolic links pointing to network-based files or folders. Unfortunately, it is at this point that Perceus' careful abstraction of the node file system falls down. To specify which files or folders get redirected you have to get into the guts of how Perceus organizes things.

I want to create a shared directory where I can put the binaries I want to run with mpi. This directory is going to be /opt/mpirun. Importantly, this directory has to exist on the head node, as well as all the cluster nodes. The first step is to add /opt/mpirun to the hybridize configuration file located at /etc/perceus/vnfs/vnfs_capsule_name. However, the /opt/mpirun directory specified in the hybridize file is not the /opt/mpirun directory on the host machine (the head node.) Oh no, this is the /opt/mpirun directory located on the physical representation of the VNFS file system that actually underlies the "virtual" VNFS file system of the nodes. In reality this is located at /var/lib/perceus/vnfs/vnfs_capsule_name/rootfs/opt/mpirun. It is this directory that actually gets shared. So because I want /opt/mpirun to exist on the head node as well, /opt/mpirun on the head node has to be a symbolic link back to /var/lib/perceus/vnfs/vnfs_capsule_name/rootfs/opt/mpirun. Not pretty. Finally, you need to mount the VNFS image and edit /etc/fstab so that the node connects to the share.

That done, we are ready to start testing the performance of the cluster by running xhpl. (Well, almost, I had to copy the libg2c.so.0 library to /usr/lib64/ on the nodes first.) Copy xhlp and HPL.dat to /opt/mpirun and go...

Monday, 22 December 2008

Installing MPI

MPI stands for Message Passing Interface. When different instances of a process -usually running on different nodes - need to talk to one another, they do so using MPI. It has become a sort of de facto standard.

The are various implimentations of MPI. I'm using MPICH2. Building and installing MPICH2 on the head node is no problem - just follow the instructions in the "From A Standing Start to Running an MPI Program" section of the Installer's Guide. The only thing you need to keep in mind is that you will need to replicate the install on all the nodes. For this reason I installed to /opt/mpich2-install via configure -prefix. I could then copy the mpich2-install directory to the mounted VNFS image:

perceus vnfs mount centos-5.1-1.stateless.x86_64
cp -r /opt/mpich2-install /mnt/centos-5.1-1.stateless.x86_64/opt/
perceus vnfs umount centos-5.1-1.stateless.x86_64

The key to getting mpi working, however, is to be able to ssh onto a node from the head node, and onto the head node from a node, without needing to enter a password. You must be able to ssh both ways. Happily, Perceus sets up the connection from the head node to the node, so you don't have to do anything. But to ssh onto the head node from a node - without needing a password - you have to do some work. On the node:

[root@node ~]#ssh-keygen -t rsa
[root@node ~]#cat .ssh/id_rsa.pub ssh root@head_node 'cat >> .ssh/authorized_keys'

This generates a private/public key pair for root on the node and copies the public key to the head node so that root will no longer need to enter a password when using ssh to connect from the node. That's fine, except the cluster nodes are stateless - they don't have their own harddrives - so the next time the node reboots the configuration will be lost. The solution is to copy the keys and ssh settings back to the node VNFS image. If the VNFS image is mounted, you can do this:

[root@node ~]#scp -r ./.ssh head_node:/mnt/centos-5.1-1.stateless.x86_64/root

Keep in mind that a .ssh directory probably already exists, so you might want to get it out of the way first. But that's it - job done!

There a few things worth noting here. You only have to generate the public/private keys once. The same keys work for all nodes - they are not related to the host name of the node (which I thought they might be.) This also means that the keys work regardless of what nic your mpi traffic is going over. (More of which later.) On mpi itself, all you have to do is make sure that the files are in the same place on each system, and it does the rest. Very smart. After that it is just a case of starting mpi on each node:

mpdboot -n 5 -f mpd.hosts

Running mpdtrace should now show you the names of your nodes.

Let's go clustering...

The reason I need CentOS is because I want to experiment with clustering. I first looked at clustering earlier this year, building a small 4 node beowulf cluster by following the "Configuration Notes" for Joel Adams' inspirational Microwulf. However, the Microwulf model is not easily scalable: for example, you need to manually create and configure a file partition on the host for each of the nodes. I want to look at something more industrial. After reading this article on the Linux Magazine website, I thought Perceus sounded just what I needed.

Perceus is an "enterprise and cluster provisioning toolkit" and supersedes the older Warewulf provisioning tools.

Perceus turned out to be pretty easy to build and install. I ended up needing to download and build all of the dependencies from the Perceus website, but nothing too onerous. I then downloaded, and "imported" into Perceus, the Caos NSA 1 VNFS "capsule". Let me just unpack that :-) Caos is a high performance, lightweight distribution of Linux. (NSA stands for "Node, Server, Appliance".) VNFS stands for Virtual Node File System. The idea is that you package up an Operating System - like Caos - into a VNFS capsule which you can then easily distribute, run and manage on your cluster nodes with Perceus. In fact the VNFS system works really well.

My "head node", running Perceus on top of CentOS 5.2, has three network cards. One nic talks to the outside world, while the other two talk to the cluster. To allow this, I fixed the firewall to completely open up the two internal network connections by adding the following lines to /etc/sysconfig/iptables:

-A RH-Firewall-1-INPUT -i eth0 -j ACCEPT
-A RH-Firewall-1-INPUT -i eth1 -j ACCEPT

That done, I started a node with a monitor attached and watched it boot into Caos. Very cool. Except that's when my problems began...

I ran perceus node status on the head node to see what state Perceus thought my node was in. Unfortunately it showed "init" and not "ready". Then I remembered that I hadn't installed provisiond on the node image. provisiond is a client-side daemon that runs on each node and talks to perceus (running on the head node) to let it know what is going on with the node.

Following the instructions in the Perceus "User Guide" I spent the best part of two DAYS trying to install provisiond. It should be as easy as this:

rpm -ivh --root /mnt/caos-nsa-node-1.0-1.stateless.x86_64 \
/usr/src/rpm/RPMS/x86_64/perceus-provisiond-1.4.0-1898.x86_64.rpm

The problem is that Caos is so high performance and so lightweight that it doesn't seem to have a working version of rpm - or any other package manager - installed. Most of those two days were spent trying to install rpm, or trying to work out what I'd missed in the User Guide. I hadn't missed anything: the User Guide is simply wrong.

Towards the end of the second day I thought I had better just check that provisiond wasn't already installed on the Caos image. It was. So the User Guide is doubly wrong: the wrong instructions for something that didn't need to be done in the first place. My node status problem had nothing to do with provisiond.

Nevertheless, the lack of a package manager is a big problem for me. One of the things most people will want to do is run mpi based applications on their cluster. That means you have to install mpi on the nodes. mpi is dependent on Python. Python isn't installed on Caos, so how are you going to install it? Not with rpm or yum or apt-get, that's for sure. Want to build it from source? How are you going to install a compiler? Perhaps that isn't an issue; perhaps you can compile it on the head node and use ./configure --prefix or something to install it to the mounted VNFS image. Are you sure all the libraries are going to be there?

May be this problem doesn't arise of you are also running Caos on the head node - I don't know. I gave up on Caos and used the centos-5.1-genchroot.sh script in /usr/share/perceus/vnfs-scripts (not "vnfs-tools" as it says in the User Guide) to create a CentOS VNFS image. The script worked perfectly and provisiond installed instantly first time.

My node status problem was down to a combination of two things. Firstly a DNS problem fixed by setting the correct nameserver entry to the head node in the /etc/resolv.conf file on the nodes; and secondly by getting the eth0 and eth1 device IP entries in the right order in the /etc/perceus/modules/ipaddr configuration file. Easy when you know how...

Wednesday, 3 December 2008

All CentOS is theft... Or is it?

Up to now I have been running Ubuntu as my Linux Server OS of choice: for me, the Fedora stack is updated far to frequently for it to be a viable server option. However, for reasons that will become apparant in future posts, I need to run a Red Hat type server. The problem is I don't want to pay for it... Wouldn't it be good if I could get hold of the Red Hat Enterprise code, without having to pay for support? Enter The Community Enterprise Operating System:

CentOS is an Enterprise-class Linux Distribution derived from sources freely provided to the public by a prominent North American Enterprise Linux vendor. CentOS conforms fully with the upstream vendors redistribution policy and aims to be 100% binary compatible. (CentOS mainly changes packages to remove upstream vendor branding and artwork.)
That "prominent North American Enterprise Linux vendor" is, of course, Red Hat. The CentOS FAQs make the following points:
  • CentOS-x is NOT Red Hat® Linux, it is NOT Fedora™ Core. It is NOT Red Hat® Enterprise Linux. It is NOT RHEL.
  • CentOS-x does NOT contain Red Hat® Linux, Fedora™ Core, or Red Hat® Enterprise Linux.
  • CentOS is built from publicly available open source SRPMS.
And so, dear reader, we enter the bizarre world of Open Source licensing. I am not saying that the folks at CentOS are a bunch of liars and thieves - clearly what they are doing is perfectly acceptable in the Open Source world. But where else would it be acceptable? What if I go and get a can of Diet Coke, scrub off the printing, and then put on my own label? In what sense would that NOT be Diet Coke? Would the The Coca-Cola Company not sue my sorry ass if I tried to pass this stuff off as my own - even if I was giving it away for free?

Now any Open Sourcers out there reading this may well be saying you just don't get it. And you know what? I don't. I write software for a living. If someone took my code, changed the logos, and then passed it off unchanged as their own, I would have some issues with that. But Open Source software isn't like that, right? It is a community effort, right? RHEL a community effort? The creation of those SRPMS files a community effort? I don't think so. (I'd better stop there, I'm starting sound like Lewis Black from the Daily Show...)

So I'm not going to use CentOS? Wrong! I'm definitely going to use CentOS. He who lives by the sword dies by the sword. If Red Hat want to play in a world where software has no value, that's up to them.

Going /home

With the release of Fedora 10 I thought I would take another look at running Linux on my laptop. I haven't changed my mind, however: I stick by my assertion that Linux is a great workbench, but a lousy desktop. I love working on Linux, but I'll be installing Fedora 10 on a separate hard drive.



One of things that might have caused my previous installation of Fedora to break so badly was that I upgraded it: from version 7 to version 8. It turns out that just because you can upgrade a system doesn't mean that you should: section 2.1.4.3 of the Release Notes states "In general, fresh installations are recommended over upgrades." But if the OS is changing every 6 months, what do you do? Backing up your data (and all your settings) every time and then restoring it all to your new system is a serious pain. One answer to this is to create a separate /home partition. By doing this you can keep your stuff out of harms way when the system and the applications get nuked by the installer.

Creating a separate /home partition is beautifully easy on Fedora 10. When the installer gets to the bit about "Select which drive(s) to use for this installation", you just need to check the "Review and modify partitioning layout" box. On the next page, click the "New" button and then add a new partition with /home as the mount point; the only other thing you just need to do is set the size. I also checked "Encrypt" - meaning my personal data would be encrypted :-)

The Fedora installer does everything else for you - including resizing the other partitions so that your new /home partition fits.

Cool. Except before I could do this, I still needed to backup my data and settings from my old Fedora installation that didn't have a separate /home partition. To get this done, I created an archive of my old home directory:

#cd /home
#tar cfv David.tar David/

I then burnt David.tar to a DVD. Once Fedora 10 was installed I created my new "David" user account and let Fedora create a home directory. I then copied my David.tar file from the DVD to /home:

#cd /media
#cd "Personal Data, Dec 01, 2008"
#cp David.tar /home

I then deleted the "David" home directory that Fedora had created for me, and recreated it from the .tar file:

#cd /home
#rm -rf /home/David
#tar xvf ./David.tar

Finally, I needed to make my user account the owner of the directory.

chown -hR David /home/David

I logged on, and it worked! My desktop appeared just as I remembered it. Maybe I didn't need to create a /home partition after all...


Footnote. Sadly, though, not everything worked. I had hoped that the version of Evolution on Fedora 10 would just pick up all my email. No chance. It starts some conversion process... and then crashes everytime. What a piece of crap that application is. The lack of an enterprise class mail client is one of the biggest failures of the Linux desktop.

Monday, 21 July 2008

The End of the Affair

How do you fall out of love? It’s hard to say. The little annoyances that you have always put up with slowly become festering grievances. But do you fall out of love because they become grievances, or do they become grievances because you fall out of love?

There is usually some event which, when you look back, you realise was the moment when things began to go wrong. For Fedora and I, it was a few months ago when after yet another kernel update the sound on my laptop - which had been working perfectly – stopped working. The volume control suddenly had a red X next to it. Bizarrely system-config-soundcard showed that the sound card was configured correctly and could play a test sound. Clicking on the volume control however, just resulted in “No volume control GStreamer plugins and/or devices found.” Why?

Yes, I found a post on LinuxQuestions.org that discussed the problem. But by now I realized I was beginning to wonder why I should bother - so perhaps it was me. Was there a problem with PulseAudio? Was it a security issue? After spending several hours completely removing PulseAudio and after reinstalling all the ALSA packages, sound was working in MPlayer and Adobe Flash Player in Firefox, but I still had no volume control.

There have been other grievances of course; there is never just one thing. Suspend was working fine: now it doesn’t. If I suspend my laptop I have to hard boot it to get it to come back to life. Could I fix this? Probably. But it is one thing to have to get something working that has never worked: it is quite another to have to fix something that has been needlessly broken. I’m fed up spending my weekends scouring the Linux support forums for fixes to problems I shouldn’t have. Life is too short.

I want to get my laptop working with Internet Connection Sharing on my phone. Yes, I could get it working… But I’ve got a wife and children I want to spent time with, and code that I want to write.

Late in the first decade of the twenty-first century what do we use our Personal Computers for? You will have your own ideas, but I would suggest browsing the Internet, reading email, storing and playing music, watching video. Linux is not very good at any of these things. Firefox is hampered by font issues, Evolution is grim, MPlayer is great – when it works. Why do I have to reinstall the MPlayer plugins whenever Firefox is upgraded? Because they are two separate applications and nothing on Linux is joined up. I would pay for a distro that took away all the hassle - that managed my installation to keep it updated but kept it working. There is no such distro.

So after eighteen months of using Fedora as my everyday laptop OS I’m going back to Vista. It’s not that I won’t be using Linux anymore - far from it. Linux now has a permanent place in our server and development environments. Linux is a great workbench but it is a lousy desktop.

Am I giving up completely on a UNIX compliant desktop which works, is stable, and supports the twenty first century? Well there is always this :-)

Tuesday, 10 June 2008

Samba and LDAP: a Wind-up

I had to revisit this because I thought I must have missed something: must have got something wrong. But I really don't think I have. The developerWorks document I quoted in an earlier post was written before the release of Samba 3. It says that "There are two things a Samba/LDAP installation cannot do 'out of the box' ". The first is "Retrieve user account information from an Windows 2000 Active Directory server"; the second is "Alleviate the need for /etc/passwd." Both these issues, the document confidently expects "will be resolved with the release of Samba 3.0." But it didn't happen. Instead, the Samba documentation states that "The second item [removing the need for /etc/passwd] can be accomplished by using LDAP NSS and PAM modules." Except that these modules are already installed on Fedora. Checking the /etc/nsswitch.conf file for the necessary entries:

passwd: files ldap
group: files ldap

shows that these entries are in place.

Everything I've read implies that I should be able to achieve what I want to do with Samba and LDAP: create groups and user accounts in LDAP and have them access file and print resources on the Samba server. And yet I've not been able to achieve that in the time I've had available. I'm disappointed. My feeling is that unless you really need to use the features of LDAP you are probably better off using Samba's tdbsam backend instead.

CLDAP

I couldn't of course just leave things as they were, so I did some digging. Everything I've read indicates that openldap, and by extension FDS, does not support UDP. As a result, an openldap server cannot respond to CLDAP queries such as those made by a Windows client. That doesn't always seem to have been the case. Earlier versions of openldap seem to have had a complile time option --enable-cldap. This option appears to have been dropped. I found a reference here to the to option no longer being available in version 2.1.22. Certainly in the configure script of the current 2.4.10 source code there is no mention of --enable-cldap.

I couldn't find any formal announcement, but perhaps the reason CLDAP support was dropped from openldap was because the protocol has been buried as an Internet standard. It's epitaph is recorded in RFC 3352.

From a practical point of view, all this means is that you have to treat a Samba server - even one with an LDAP backend - as a NT 4.0 server and connect to it via NetBIOS. If you specify a DNS name, Windows thinks you are connecting to Active Directory. This is made clear by the subtly different error message you get on Vista:


Don't think this is the end of CLDAP, however. It is obviously still being used by Active Directory, and if it is still being used by Active Directory, it will have to be supported by Samba 4.0.

Monday, 9 June 2008

Joining a Samba Domain

Joining a Windows workstation to a Samba domain is quite instructive: it tells you an awful lot about how a Windows workstation joins a Windows domain...

Here's what I got the first time I tried to join a Windows XP machine to my RIVERSIDE Samba domain:


The error message goes on: "The query was for the SRV record for _ldap._tcp.dc._msdcs.RIVERSIDE"

What's happened here is that Windows has first tried to look up the NetBIOS name "RIVERSIDE" assuming - correctly - that it is not a DNS name. When the NetBIOS name look up failed, Windows then queried DNS. Most of this error message relates to the DNS look up, and as such is quite misleading.

The standard solution you will find on all the Samba discussion forums is to add the Samba server to workstation's list of available WINS servers. One simple way of doing this is to get your DHCP server to pass on the address of the Samba server. If your DHCP server is Linux you just need to add a line like the following to your /etc/dhcp3/dhcpd.conf:

option netbios-name-servers 192.168.2.8;

When the workstation next gets an IP address from the DHCP server it will know about your Samba server (running WINS) and will be able to resolve the domain NetBIOS name. This works. And maybe, if you're sensible, you should stop there. However, aren't you a bit curious about the " SRV record " stuff?

If, instead of supplying a NetBIOS domain name, we supply a DNS domain name, like riverside.forensit.com, Windows will attempt to look up - not the DNS domain name itself - but a service location (SRV) resource record for a Domain Controller for the domain. In other words, it queries DNS for something like:

SRV _ldap._tcp.dc._msdcs.RIVERSIDE.FORENSIT.COM

In order for your DNS server to answer this query you need to add some lines to your DNS config files:

_ldap._tcp IN 1H SRV 0 100 389 medway
_ldap._tcp.riverside.forensit.com-site._sites IN 1H SRV 0 100 389 medway_ldap._tcp.riverside.forensit.com_site._sites.dc_msdcs IN 1H SRV 0 100 389 medway
_ldap._tcp.gc._msdcs IN 1H SRV 0 100 389 medway_ldap._tcp.dc._msdcs IN 1H SRV 0 100 389 medway
_ldap._tcp.dc._msdcs.riverside.forensit.com IN 1H SRV 0 100 389 medway_ldap._tcp.gc._msdcs.riverside.forensit.com IN 1H SRV 0 100 389 medway
_gc._tcp IN 1H SRV 0 100 3268 medway_gc._tcp.riverside.forensit.com-site._sites IN 1H SRV 0 100 3268 medway

On Fedora, these get added to the /var/named/*.db file. You can add these entries using system-config-bind, although (rightly) it does complain.

After restarting named on the DNS server, the Windows workstation should be about to get a response to its SRV query. On my riverside.forensit.com domain it can... But - and throughout this whole long exercise there has always been a but - that doesn't get me much further. Windows returns the error "A domain controller for the domain RIVERSIDE.FORENSIT.COM could not be contacted." The details are:

DNS was successfully queried for the service location (SRV) resource record used to locate a domain controller for domain RIVERSIDE.FORENSIT.COM:

The query was for the SRV record for _ldap._tcp.dc._msdcs.RIVERSIDE.FORENSIT.COM

The following domain controllers were identified by the query:

medway.riverside.forensit.com

Which is all absolutely correct. I can ping medway.riverside.forensit.com (the DC.) I can look it up using nslookup. So what's going on now?

I resorted to Wireshark. Having successfully queried for the DC name, the XP workstation sent a search request for the "ROOT" base object of the LDAP directory over CLDAP (Connectionless LDAP.) It was this request that failed: the server returning "Destination unreachable (Port unreachable)" over ICMP. However, the LDAP error log on my server just reports " - slapd started. Listening on All Interfaces port 389 for LDAP requests." It's almost like a firewall problem, except Firestarter reports no blocked connections.

So that's as far as I've got. I hate leaving unanswered questions, but I've already spent far too long on Samba: the questions will have to wait. It maybe that what I'm attempting here can't be done. I've come across some posts by glauco.b on the Ubuntu Forums saying that openldap does not support UDP - which is what the CLDAP query goes over.

Friday, 6 June 2008

Roaming Profile Error Logging On To Samba

When I attempted to log on to my Samba domain from an XP workstation I got this:


Followed by this:


Except I'm not using roaming profiles.

The solution was to explicitly set the "logon path" and "logon home" parameters to nothing in the /etc/samba/smb.conf file:

logon path =
logon home =

I seem to remember a customer complaining about this in the past. Now I know.

Samba A101

RULE ONE: Before you can add a user to the smbpasswd password file, that user must have an UNIX/Linux account (typically stored in /etc/passwd) on the system hosting the Samba service.

And that's what I was missing. You can't add a Samba user unless the posix user account exists first. This applies to Samba implementations with a LDAP "back end" just as much as it applies to traditional Samba implementations. It never occurred to me that this would be the case. On a Windows Active Directory server, you don't have to create a local user account as well as creating a user account in AD, the LDAP database. On NT 4.0 Servers - which Samba was originally developed to imitate - things are different, of course. But in February 2000 Windows moved on: Samba has still to catch up eight years later.

Suddenly the scales have been lifted from my eyes. Fedora Directory Services, and indeed other Linux LDAP implementations, are at best "bolt ons" for Samba. LDAP does not fundamentally change the way Samba operates, and emphatically Samba + LDAP does not equal AD.

Crucially, Samba cannot integrate LDAP with the Linux file system. On a Windows server, you can create a group in AD and then give that group access to a folder. Samba 4 will have its own LDAP directory that will allow it to be an LDAP server for AD clients. However, there will still be no integration with the file system. As a recent article in Linux Magazine puts it:

On a more somber note, Samba4 lacks great integration with the POSIX system on which it sits. It cannot map NT ACLs into POSIX ACLs, it requires users and groups to be added to Linux as well as to its internal ldb database.

So why bother implementing Samba (3) with LDAP at all? For the moment I can't think of a single good reason. Apparently a LDAP back end provides better scalability and performance. But any organization needing scalability and performance isn't going to be looking at hacking around with Samba and LDAP anyway. They are going to look to Active Directory, and the product that Active Directory set out to destroy - Novell Netware. I'm suddenly very interested in how OES integrates with the Linux file system - if indeed it does.

My illusions shattered, I got Samba and LDAP working quite quickly. It was just a case of creating the accounts I needed using useradd or system-config-users. This included creating the "Administrator" account (which then allowed me to use pdbedit to set the SID) and my Windows workstation accounts:

/usr/sbin/useradd -n -c "Workstation FDSPC" -M -d /nohome -s /bin/false FDSPC$

I then called "smbpasswd -D 10 -a Administrator" as I did before. This time, however, I got this:

smbldap_search_ext: base => [dc=riverside,dc=forensit,dc=com], filter => [(&(uid=Administrator)(objectclass=sambaSamAccount))], scope => [2]
ldapsam_getsampwnam: Unable to locate user [Administrator] count=0
pdb_set_username: setting username Administrator, was
pdb_set_full_name: setting full name Administrator, was
pdb_set_domain: setting domain RIVERSIDE, was
...etc...

which makes sense: Samba checks LDAP to see if the accounts exists and creates it if it doesn't. I used smbpasswd again to create the workstation accounts:

/usr/bin/smbpasswd -a -m FDSPC$

This done, I was finally able to add my Windows workstation to my Samba LDAP domain.

Thursday, 5 June 2008

Getting More Information From smbpasswd

smbpasswd has 10 debug levels specified by the -D switch. Level 10 is not recommended: the man page says "Levels above 3 are designed for use only by developers and generate HUGE amounts of log data, most of which is extremely cryptic." Really?

smbpasswd -D 10 -a testuser

gives us this:

smbldap_search_domain_info: Searching for:[(&(objectClass=sambaDomain)(sambaDomainName=RIVERSIDE))]
smbldap_search_ext: base => [dc=riverside,dc=forensit,dc=com], filter => [(&(objectClass=sambaDomain)(sambaDomainName=RIVERSIDE))], scope => [2]
The connection to the LDAP server was closed
smb_ldap_setup_connection: ldap://riverside.forensit.com
smbldap_open_connection: connection opened
ldap_connect_system: Binding to ldap server ldap://riverside.forensit.com as "cn=Directory Manager"
ldap_connect_system: successful connection to the LDAP server
ldap_connect_system: LDAP server does not support paged results
The LDAP server is successfully connected
pdb backend ldapsam:ldap://riverside.forensit.com has a valid init
smbldap_search_ext: base => [dc=riverside,dc=forensit,dc=com], filter => [(&(uid=testuser)(objectclass=sambaSamAccount))], scope => [2]
ldapsam_getsampwnam: Unable to locate user [testuser] count=0

testuser can't be found. Of course it can't be found - we're trying to create it!

More Samba Problems

I am getting to grips with how a Windows client discovers the Samba Domain Controller on the network. I will cover this in a later post. Unfortunately, there is still something profoundly wrong with my installation of Samba which I need to resolve before going any further.


It started when I was following the FDS instructions for setting up Samba. Having religiously followed every step, I got to the bit about setting up the Administrator account:

# smbpasswd -a Administrator -w ldap-admin-password

This returned:

Setting stored password for "cn=Directory Manager" in secrets.tdb

Was that right? I suspect not. What I've discovered is that if I replace "Administrator" here with any account name - including no name at all - I get the same result!

The next step was where it really went wrong. This is supposed to modify the Samba Administrator account to use the correct SID (one ending in a RID of 500):

# pdbedit -U $( net getlocalsid sed 's/SID for domain RIVERSIDE is: //' )-500 -u Administrator -r

All I get is:

Username not found!

The final step in the instructions is to test creating a new user:

# smbpasswd -a testuser

Which just returns:

Failed to modify password entry for user testuser

I've done a lot of googling on this. There are plenty of reports of this sort of error, but precious few answers.

Tuesday, 3 June 2008

Connecting to Samba... or not...

I have managed to get over most of my connection issues. These were down to a combination of DNS configuration problems, and firewall issues.

The DNS problems stemmed from the fact that there were "allow-query" and "listen-on" entries in the /etc/named.conf file that effectively restricted queries to the localhost. So, for example, dig @localhost riverside.forensit.com would work, but using the IP address dig @192.168.2.8 riverside.forensit.com would fail. My guess is that these entries were written to the file when I was messing around with system-config-bind, although I don't recall changing anything.

After fixing the DNS problems, I installed Firestarter. I might have been able to get away with Fedora's own system-config-firewall, but I'm familiar with Firestarter - and it logs blocked connections so you can easily see what is going on. Once I'd opened up the Samba, DNS, LDAP and HTTP ports - not forgetting the Fedora Management Console port, I was starting to get somewhere. I was still unable to see my "Riverside" domain on the network, however. I couldn't see it from a Windows machine, and I couldn't even see it from "Network" on the server itself.

Being able to browse for the domain on the network is dependent on the Samba nmbd daemon (that's service for the Windows-minded.) nmbd is controlled by the line

wins support = yes

in the /etc/samba/smb.conf file. You would think then, that when you started Samba, nmbd would automatically be started too. Well I did. It wasn't. I manually started "nmb" via system-config-services and "Riverside" appeared under "Microsoft Windows Network" on my XP machine.

Almost there? Unfortunately there is still something missing. Although I can "see" the domain, I cannot get a Windows machine to join the domain. There's still more work to do.

Wednesday, 28 May 2008

Installing Samba with Fedora Directory Server

The instructions are here:

http://directory.fedoraproject.org/wiki/Howto:Samba

All I can say is follow the instructions to the letter. Do not deviate from the instructions in any way. I had all sorts of problems, but they were all related to mistakes I'd made following the instructions. Eventually however, I did get the Samba service to start without any errors.

So I'm all set, right? No... Although everything appears to be running correctly locally, I cannot connect to the machine - as a domain controller - across the network, and I cannot connect via Fedora Management Console from another computer.

It really shouldn't be this difficult.

Tuesday, 20 May 2008

Installing VMware Tools on Fedora

I always forget the steps involved, so here they are for reference.

1. Install the required development packages:

yum install kernel-devel gcc gcc-c++

2. Untar the vmware-tools-distrib from the "virtual" Install CD. (Don't bother with the .rpm file.)

3. cd to vmware-tools-distrib. Run ./vmware-install.pl

That's it! Except I have had to add the following lines to /etc/X11/xorg.conf to get the display to work:

Section "Monitor"
Identifier "vmware"
EndSection


There are much better instructions here.

Wednesday, 14 May 2008

Fedora Management Console Trouble

When I tried to log from Fedora Management Console the first time, it didn't work. All I got was a HTTP 404 "Not found" error.


Checking the /var/log/apache2/error.log file showed that apache was trying to serve /var/www/admin-serv which didn't exist. When the machine booted, there was also an error when apache2 started: "VirtualHost _default_:8443 --mixing * ports and non-* ports with a NameVirtualHost address is not supported..."

I am not proud to say that it took me days to overcome this. It took me installing and setting up a Fedora 8 server (because I thought the problem was related to running FDS on Ubuntu) only to find that I got the exact same HTTP 404 "Not found" error logging on to the Fedora server from the Management console.

It may well be that there was more than one problem with the Ubuntu installation - when I have time I will go back and check. To connect to the Fedora server I eventually found that all I had to do was specify the port number used when I ran setup-ds-admin.pl.


Doh!

Installing Fedora Directory Server

It turns out that our customer is using Fedora Directory Server. Installing an Ubuntu server was probably not the best option then...

No matter. It is certainly possible to install FDS on Ubuntu: the full instructions are here. Mercifully, however, someone who should be given a medal has created the install packages. These can be downloaded here. It is just a case of adding

deb http://ubuntu.opencodes.org gutsy main

to the /etc/apt/soures.list file, running apt-get update, and then running

apt-get install fedora-ds-admin

I also ran apt-get install fedora-idm-console. I don't know if this is really necessary. Ubuntu server doesn't have a desktop, but I do want to use the Windows Fedora IDM Console remotely. That done, I ran

setup-ds-admin.pl

It failed. The error was a result of a "Netscape Portable Runtime error - 5977: libicui18n.so.36: cannot open shared object file: No such file or directory" The solution to this was to install libicu36:

apt-get install libicu36

why this library wasn't already installed, I don't know - unless it is installed with the desktop. I also installed termcap (mistakenly) thinking this had something to do with it. The Ubuntu instructions say that Termcap should be installed. However, installing Termcap on 64-bit Ubuntu turns out to be a pain. It requires downloading the 64-bit rpm from Fedora, running alien to convert it to a .deb file, and then installing the file using dpkg. There are good instructions here.

Once libicu36 was installed, I ran setup-ds-admin.pl again and it worked perfectly! I just accepted the defaults - happily noting all the correct domain name settings :-) - and the job was done.

Tuesday, 13 May 2008

Configuring DNS on Fedora

Out of interest I decided to take a look at Fedora's DNS configuration tool, system-config-bind. system-config-bind turns out to be pretty horrible. Coming from a Windows background, I suppose I've come to expect GUI tools to provide a level of abstraction, to take the complexity out of system configuration. That's what Windows Server Wizards do, and it's what we at ForensiT hope our own Wizards do. system-config-bind doesn't do that. If you don't have the knowledge to write your own bind configuration files, don't expect system-config-bind to help you out. The only level of abstraction it provides is from the actual configuration files themselves.

Start up system-config-bind and you see the following:


If you click the "New" button, or right click on "DNS Server", you get the chance to add a new item. (I'm not sure this is the right word to use, but it will do for now.) We want to create a new zone, so that's what I'll do. I then get this:


This gets my nomination for the worst GUI of the year award. There are three OK buttons! THREE! How are you supposed to know what to do first? It is an abomination to the art of user interface design.

What you have to do is click on each of the top two OK buttons. Starting with the top left, select the class from the drop-down list; in this case it is "IN". Click the top left OK button.


Great a dialog box that looks almost exactly the same as the first one! However, we're down to two OK button so we must be making progress. We're creating a Forward zone so we just click the top OK button.


Finally a dialog box that can be understood. We just need to type in our domain name - not forgetting the the dot at the end. system-config-bind does actually remind you about this. That done, when you click on OK you can enter the details for the Zone:

It is not the most friendly dialog box I've ever seen, but it is relatively straight forward. When you've filled in your settings and clicked OK, you have created your zone.

Next you need to create the A records for you domain: highlight the zone, right-click or click "New" and choose "A IPv4 Address"


There is some benefit to using system-config-bind then. It creates the reverse DNS settings for you, so you don't have to mess around creating and editing .arpa files. To be fair, there are other benefits to. By selecting "DNS Server" and clicking the "Properties" button you get to edit a whole range of, well, DNS server properties. It is just a pity the User Interface was - don't think we can use the word designed - created by someone who hasn't got a clue.

When you're done, you can right-click "DNS Server" and choose "Start Server"

Setting the FQDN

Another quick tip for those like me who are consolely challenged... A linux machine's Fully Qualified Domain Name is set in the /etc/hosts file. The entry just needs to be something like this:

127.0.0.1 medway.riverside.forensit.com medway localhost

You can check the FQDN by running hostname with the -f switch.

Monday, 12 May 2008

Configuring DNS

Having fixed my VMware woes, installation of Ubantu Server was easy. I choose the DNS, OpenSSH and Samba server options. OpenSSH is extremely useful: it allows you to open a secure console onto the server from another machine - including a Windows machine using a utility like PuTTY.

When you set up a new Windows domain you need to setup a DNS server for that domain. I'm assuming the same thing goes for Samba. Setting up a DNS server on Ubuntu isn't difficult, but it is long-winded and it does require that you edit a whole bunch of text files. This isn't so bad if you are using a GUI, but Ubuntu server doesn't install a desktop by default so you're stuck with console based text editors. You can install a desktop, like GNOME, but so much stuff you don't need gets installed along with it, stuff like Evolution and GIMP, that it's probably better to get along without it.

So which text editor? Unix hard men will now roll up their sleeves to reveal vi tattooed on their sallow, bloated arms. If you can get used to vi good luck to you. (There is a good tutorial here.) You can also use nano.

On Ubuntu there is a hierarchy of DNS configuration files. (It is different on Fedora, so what follows is probably not applicable. You are probably going to be using something like system-config-bind anyway.) Top of the pile is /etc/bind/named.conf. named.conf has entries to "include" two other files: /etc/bind/named.conf.options and /etc/bind/named.conf.local. You will probably not need to edit named.conf itself.

named.conf.options allows you to set a "forwarders" entry to a nameserver that can resolve all the domain names your DNS server doesn't know about. You just need to uncomment the block and enter the IP address. At ForensiT we already have a DNS server, so that server's IP address goes in here.

named.conf.local is where your domain really begins. Given that our new domain is going to be "riverside.forensit.com" and the server name is "medway", we need to add something like this:

zone "riverside.forensit.com" {
type master;
file "/etc/bind/zones/riverside.forensit.com.db";
};

We will also need to add a zone definition for reverse DNS:

zone "2.168.192.in-addr.arpa" {
type master;
file "/etc/bind/zones/rev.2.168.192.in-addr.arpa";
};

Now we need to create the two files referenced in the entries we just created. The zone .db file needs an entry like this:


A couple of things are worth pointing out in passing. "admin.riverside.forensit.com." is not a server: this information is interpreted as an email address(!) and is required. The line immediately below is the version of the file; it is based on the date with a number appended. As well as adding an "A" record for the server, I've added an "A" record for the domain as well - this follows what Windows does.

Similarly, we create the .arpa file:


We're getting there. However, a DNS server needs a static IP address, so before doing anything else we need to edit the /etc/network/interfaces file:

auto eth0
iface eth0 inet static
address 192.168.2.8
netmask 255.255.255.0
gateway 192.168.2.1

(I rebooted at this point.)

The use of all these config files is a recipe for trouble. Fortunately, before starting bind we can check that there aren't any problems with the files. We just need to run:

named-checkconf -z /etc/bind/named.conf

If all is well, we are almost ready to fire up our DNS server. There is one more file to edit, however. We need to change the entries in /etc/resolv.conf to reflect the new configuration:

search riverside.forensit.com
nameserver 192.138.2.8

(The previous entries were set by DHCP.) Finally, we can start bind:

sudo /etc/init.d/bind9 start

If you want to check for any errors on start up, you can look in the /var/log/daemon.log file. (Handy to know if, like me, you're only used to checking log files from the Desktop.) We can now use dig to make sure that our DNS server is doing what it should be:

dig riverside.forensit.com

Our existing DNS server needs to know about the new domain, so we add a forwarders entry for the new domain in the named.conf.local file of the existing DNS server:

zone "riverside.forensit.com"{
type forward;
forwarders{192.168.2.8;};
};

Don't forget to restart bind!

At ForensiT we find that customers frequently forget to do this when setting up a new domain, which leads to all kinds of problems. If your DNS server is a Windows server, you can find instructions in the User Profile Wizard User's Guide on creating a forwarders entry for the new domain.

The next step is to configure Samba. Before we do that, however, we need to set up LDAP.


If you're after some proper instructions on setting up DNS on Ubuntu try these links:

https://help.ubuntu.com/community/BIND9ServerHowto
http://ubuntuforums.org/showthread.php?t=236093
http://www.ubuntugeek.com/dns-server-setup-using-bind-in-ubuntu.html




VMware Problem Running 64-bit Ubuntu

I'm not off to a good start :-( As soon as I try to boot the VMware virtual machine I get this:


I'm trying to install the 64-bit version Ubuntu server. The host machine is a Dell PowerEdge 2950 with dual quad-core 64-bit Xeons. I've got a choice: I can down the server - and all the virtual servers running on it, of course - and check the BIOS settings or just download the 32-bit version.

Installing the 32-bit version of Ubuntu Server wouldn't get to the bottom of the problem, so I went for a reboot and checked the BIOS. Sure enough, vitualization support was disabled. (Why?!) More importantly, enabling virtualization fixed the problem.

Installing a Samba Domain

One of our potential customers is having problems joining a XP workstation to their new Samba 3.0.28 domain using User Profile Wizard. Although we've tested using Samba before, we really need to build an up to date Samba domain in our lab for troubleshooting purposes... So that's what I'm going to do.

I'm using Ubuntu because, much as I love Fedora, I don't want to have to upgrade the kernel every week. (That's OK if you're using the latest laptop, not so great for your server.) And I'm using Ubuntu 7.10 (Gutsy Gibbon) and not the current version 8.04 (Hardy Heron) just because I've got it to hand. The new server will be installed as a VMware virtual machine on VMware Server, which itself runs on Ubuntu Server 7.10. What I'm going to do here is record my steps, primarily for my own reference, but also because - if you're reading this - it might be of some use to you.

First step: create the virtual machine.