Wednesday, 16 December 2009

F#, Erlang and GPUs


Christmas time in olde Rochester towne, and as is traditional there is a fair in the grounds of the castle. I like carousels, and so do my kids, but here’s the thing: you can’t just go up to the carousel and go for a ride. The first thing you have got to do is wait for the carousel to stop and wait for the other people to get off. Once you get on you still don’t go anywhere; the guy running the carousel doesn’t want to run it half empty, so you have to wait for it to fill up. Only once it’s filled up, or if no one else turns up, do you get to ride. Finally, everyone gets the same ride: the music plays, the cranks turn, the horses go up and down, the same for everyone.

At ForensiT we’re right at the start of planning a major new project. For the first time, we’re thinking seriously about F#. F# is now a fully featured .NET language and ships as part of Visual Studio 2010. But why even consider F# instead of, say, Erlang?

When I was kid, a long time ago before the computer was personal, there used to be a lot of moaning a about the Japanese. All they did was take the products that we had invented and make them cheaper... oh, and... er... better. From a purely financial point of view, it’s not such a bad business model. Microsoft have a reputation for doing much the same thing...

Here’s some F# code that Luca Bolognese demonstrates in his excellent PDC 2008 presentation:


Here’s some Erlang code that does the same thing


Is it just me, or are there some similarities here? Except this is where F# begins, it is not where it finishes. It is always vital to keep in mind that Microsoft has, to all intents and purposes, unlimited resources. If Microsoft are going to do a functional programming language then, eventually at least, they are going to come up with the most powerful functional programming language there is.

F# is a .NET language, that means that a F# programmer has complete access to all the .NET code that Microsoft and others have developed over the last decade. More importantly perhaps, a F# programmer will have access to all the future innovations in the .NET framework. What I’m thinking of here in particular is future support for multi-core processors and concurrency. The speed at which Microsoft can assimilate and support new hardware and software technologies is just way, way beyond what even well supported projects like Erlang can do. If you choose Erlang over F# you have to be aware that you are giving all this up.

One technology that is already pressing is the GPU and programming languages like CUDA. There are already third party .NET libraries for CUDA and OpenCL. Without doubt, native support will follow. For Erlang to support GPU processing would require the Erlang runtime to be rewritten. Can we really expect the Erlang developers to spend precious time and effort on a technology which may not stand the test of time?

Notwithstanding the general point about future innovations in the .NET framework, how desirable would GPU support in Erlang actually be? The beauty of Erlang is in its message passing: multiple processes (“agents” in F#) signal each other when they want something done, or when they have a response. Just as a thought experiment, let’s imagine a moderately sized Erlang application with many processes. Let’s also assume that that the vast majority of these processes carry out the same task in response to the same message; in other words, they are different instances of the same code. The application runs and messages start flying around; realistically, the different processes receive messages at different times.

What happens when a process receives a message with some data that it has to process? Here’s the first point where GPU code might be useful. If the task was sufficiently complex, and sufficiently parallelizable, being able to call GPU-code would be a big help. (I don’t really have a clear idea about what sort of task this would be, but the calculation of a Mandelbrot set is the kind of thing I have in mind.) You can of course call C functions from Erlang, so some CUDA interop is already possible; leaving aside the question of how efficient it would be.

But what if the task each process carries out is not complex? Could GPU code make the basic infrastructure of Erlang, the framework of messaging passing, more efficient? Let’s go back to our thought experiment. A message comes in with data that needs to be processed. GPUs work with batches of data, they are fundamentally SIMD (Single Instruction, Multiple Data) devices. To take advantage of the GPU, the process would need to hand the data off to some kind of scheduler and wait. The scheduler wouldn’t want to run the GPU half empty, so it would wait for a certain time until either it had enough data to use the available GPU cores, or the waiting time was up. Once the ride was over, the GPU would then have send a result back to each of the waiting processes.

The question then, becomes one of whether it is faster to wait for the batching and unbatching of the GPU code, or faster just to wait for a time-slice on a CPU core. I certainly don’t have the answer. What we can say is that our thought experiment is very GPU-friendly. If you only have a small number of processes, or the processes are doing different things, a GPU is not going to give you any advantages.

The truth is that functional, message-passing languages and GPUs occupy two different parts of the parallel computing world. It is not as obvious, as I first thought it was, that they actually need each other.