(UPDATE: please see the next post: Stop the Presses: OCaml wins in terms of speed in which it is shown that there were some serious flaws in my OCaml implementation below. I’ll keep this post around for reference.)
I’ve been dabbling with OCaml for the last several years and so when I saw a recent Meetup notice about an F# machine learning code dojo being led by Mathias Brandewinder within walking distance of my house I thought I’d check it out. I’m one of those Linux guys who tends to eschew IDEs, prefers to use vim, and considers .NET to be some very foreign territory. F# is based on OCaml, so I figured I’d have a bit of a headstart in the language area even if I haven’t done any .NET development.
Also, I was curious as I had been hearing rumors on twitter that F# programs were running on Mono faster than the equivalent OCaml versions. I was also taking Andrew Ng’s Coursera Machine Learning course so the timing seemed good. So I loaded up mono, monodevelop and F#. As it turns out none of it worked on the day of the dojo, so I looked on with someone with a Windows laptop and Visual Studio.
The code dojo itself was a great way of learning F#. Check out Mathias' Digits Recognizer code Dojo code and instructions which in turn was based on a Kaggle Machine Learning challenge. The task was to implement a k-NN (K Nearest Neighbor; in this case K=1) algorithm for recognizing handwritten digits from a dataset that was provided: A training set of 5000 labeled examples and a validation set of 500 labeled examples. Each digit example is a set of 28X28 grayscale pixels (784 integer values from 0 to 255). We wrote a classifier that reads in the training set and then compares each of the 500 validation samples against those 5000 training exmples and returns the label associated with the closest match (the vector with the shortest distance from a training example). k-NN is one of the simplest machine learning algorithms, even so, we were able to get a 94.4% accuracy rate.
After the Dojo, I spent some time getting Monodevelop and F# running on my Ubuntu 12.04 laptop. Turns out I needed to get Monodevelop 4.0 (built from source) and F# 3.0 (initially tried F# 3.1, but Monodevelop 4.0 wasn’t happy with that version). Then I set about reimplementing the algorithm in F# and after I got that working I reimplemented the same algorithm in OCaml.
Here’s my F# implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
Timing when running this on my Dell XPS 13 (i7 Haswell, 8GB RAM):
real 0m22.180s user 0m22.253s sys 0m0.116s
And the OCaml implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
The code is very similar. The OCaml version is a bit longer because I needed to implement some functions that are built into the F# library, specifically, minBy (Array.minBy is built in to thd F# standard lib), list_sum (Array.sum is the F# builtin function) and read_lines (File.ReadAllLines is the F# builtin). One difference that could possibly effect performance: The F# version reads the data into Arrays whereas the OCaml version reads the data into Lists. The OCaml version was just easier to read into a List and OCaml’s List module has a map2 function whereas OCaml’s Array module does not (F# apparently has a map2 for both Array and List).
(EDIT: But as it turns out, using a List here instead of an Array is probably responsible for about ½ of the performance difference between the F# and the OCaml version. See the update )
I was surprised how much less performant the OCaml version was:
real 0m47.311s user 0m46.881s sys 0m0.135s
The F# implementation was slightly more than twice as fast o_O.
Next, I recalled from the dojo that it’s very easy to parallelize the maps in F#; instead of Array.map you use Array.parallel.map. So after a couple of edits to add parallel maps, I rebuilt and ran with the following time:
real 0m16.400s user 0m47.135s sys 0m0.240s
So after that very simple change the F# version was very nearly 3X faster than the OCaml version. And in top I could see that the CPU usage was up to 360% as it ran.
Ah, but what about OCaml’s Parmap? I modified my OCaml version to use Parmap… but it took a lot longer to get it working than parallelizing the F# version. Probably about 30 minutes by the time I got it figured out (including figuring out how to compile it).
Here’s the OCaml version of the classify function and num_correct that uses Parmap:
1 2 3 4 5 6 7
Notice that you have to tell it how many cores you have.
After I got it to compile I ran it with much anticipation…
real 1m43.140s user 5m38.809s sys 0m42.811s
… over twice as slow as the original OCaml implementation. I noticed in top that this version launched 4 different classifyDigitPar processes so I’m guessing that it uses sockets to communicate between the processes. For this particular problem that doesn’t appear to be effective. Perhaps it would be for a longer running example, but I’m not sure. F# certainly wins in the parallelization department.
Other than performance (where F# on Mono seems to win handily) I noticed the following:
- The Array slicing syntax in F# is quite nice.
- F#’s standard library is just more comprehensive. I could have probably used Janestreet’s Core library for OCaml which is quite comprehensive, but I wanted to stick with the standard library for both.
- Monodevelop has a vi editing mode. It’s kind of ok. But it’s missing a lot of keybindings that you’d have in native vi.
- I have no idea what F#’s (or Monodevelop’s) equavilent to opam is. I’m guessing OCaml wins in the package management department.
- F#’s polymorphic math operators were nice (no need for +., etc.).
You can find the code above in my github
I want to implement the algorithm in Julia. I’m going to guess that Julia’s vectorization support means that it will beat both OCaml and F#.